Monday, March 26, 2012

Multiple Linear Regression With Sparse Matrices In R

Alternate working title: For the love of god would it hurt them to write a fucking HOWTO?

I am not a mathematician. I'm just a humble engineer who happens to have a really big linear model that needs regressin'. And I have now spent the better part of 3 days figuring out how to do it in R so that you don't have to.

Step 1: Find yourself a sparse matrix representation. R, helpfully enough, has two popular packages for sparse matrices: Matrix and SparseM. Matrix is included in the stock R distribution these days, which means that you should use that package. Except... the best algorithm for fitting a sparse model makes use of the Cholesky decomposition of the matrix, a function which is currently busted:

cholesky_decomp <- chol(mm_cross) 
Error in .local(x, ...) : temporarily disabled 

Of course, you wouldn't discover that until after you'd spent a couple of hours figuring out how to massage your data into the appropriate format. So... SparseM it is.

Step 2: Massage your data. I happen to be pulling my data out of a database, which makes it easy for me to create <i,j,value> triplets. That being the case I stuffed my data into a matrix.coo object, the SparseM native format for matrices stored in coordinate form. Here's how that works... suppose you have your data in a data frame like this:

 i | j | x
-----------
i_1|j_1|x_1
i_2|j_2|x_2
...

Assume that the overall matrix is n X m in size. The appropriate call to create the sparse matrix is

sparse_matrix.coo <- new(
        "matrix.coo",
        ia = my_data_frame$i,
        ja = my_data_frame$j,
        ra = my_data_frame$x,
        dimension = c(n,m)
)

So far so good? Alright... the slm.fit function requires a matrix in "compressed sparse row" (csr) format, so convert your coo matrix:

sparse_matrix.csr <- as.matrix.csr(sparse_matrix.coo)

Step 3: Fit it. Suppose you have your response data in a plain ol' (i.e. not sparse) vector. The call for the fit function is then

fit_result <- slm.fit(sparse_matrix.csr,response_vector)

The model is fitted! Great rejoicing and merriment!

That is all... be glad you didn't have to RTFM.

0 Comments:

Post a Comment

<< Home

Blog Information Profile for gg00