Monday, March 26, 2012

Multiple Linear Regression With Sparse Matrices In R

Alternate working title: For the love of god would it hurt them to write a fucking HOWTO?

I am not a mathematician. I'm just a humble engineer who happens to have a really big linear model that needs regressin'. And I have now spent the better part of 3 days figuring out how to do it in R so that you don't have to.

Step 1: Find yourself a sparse matrix representation. R, helpfully enough, has two popular packages for sparse matrices: Matrix and SparseM. Matrix is included in the stock R distribution these days, which means that you should use that package. Except... the best algorithm for fitting a sparse model makes use of the Cholesky decomposition of the matrix, a function which is currently busted:

cholesky_decomp <- chol(mm_cross) 
Error in .local(x, ...) : temporarily disabled 

Of course, you wouldn't discover that until after you'd spent a couple of hours figuring out how to massage your data into the appropriate format. So... SparseM it is.

Step 2: Massage your data. I happen to be pulling my data out of a database, which makes it easy for me to create <i,j,value> triplets. That being the case I stuffed my data into a matrix.coo object, the SparseM native format for matrices stored in coordinate form. Here's how that works... suppose you have your data in a data frame like this:

 i | j | x
-----------
i_1|j_1|x_1
i_2|j_2|x_2
...

Assume that the overall matrix is n X m in size. The appropriate call to create the sparse matrix is

sparse_matrix.coo <- new(
        "matrix.coo",
        ia = my_data_frame$i,
        ja = my_data_frame$j,
        ra = my_data_frame$x,
        dimension = c(n,m)
)

So far so good? Alright... the slm.fit function requires a matrix in "compressed sparse row" (csr) format, so convert your coo matrix:

sparse_matrix.csr <- as.matrix.csr(sparse_matrix.coo)

Step 3: Fit it. Suppose you have your response data in a plain ol' (i.e. not sparse) vector. The call for the fit function is then

fit_result <- slm.fit(sparse_matrix.csr,response_vector)

The model is fitted! Great rejoicing and merriment!

That is all... be glad you didn't have to RTFM.

Thursday, March 15, 2012

Skeptics, Progressives, and Title VII

One of the joys of the past few years has been watching the transformation of skepticism from a mere intellectual stance to a full-fledged movement. There now exists an easily-identifiable skeptical community which has coalesced around institutions like Freethought Blogs and Skepticon, large enough that we can yell about things and occasionally make a difference. Which I find interesting because it raises the question of what we should do once we engage the political process i.e. what does "skeptical" public policy look like?

Skeptics, as Michael Shermer puts it, are the "watchmen of reasoning errors". If you agree with Shermer then that would seem imply that skeptics should require public policy decisions to be rational and, where appropriate, based on verifiable, empirical claims. More pithily:

What do we want? Evidence-based change. When do we want it? After peer review.

Now here's a complication: Skepticism is highly correlated with progressive political views. You look at a place like Freethought Blogs and you find that, with a few exceptions, the denizens thereof tend to range from the left to the far left of the political spectrum. I see this as a potential source of tension because causes which are near and dear to progressives as a whole may have evidentiary issues which should cause skeptics to approach them... well... skeptically.

What got me started on this train of thought was that I just finished reading Forbidden Grounds: The Case Against Employment Discrimination Law by Richard Epstein. If nothing else the work provides an interesting overview of the history of the Civil Rights Act though, being written in 1995, the legal analysis is probably a bit dated by this point1. What's particularly relevant in the context of our current discussion, however, is Epstein's sustained attack on the theoretical underpinnings of adverse impact lawsuits brought under Title VII of the act.

Put bluntly, disparate impact lawsuits rely heavily on dubious math and faulty assumptions. A baseline assumption in hiring lawsuits, for example, is that the workforce should match a certain demographic profile, significant deviations from which being taken as proof that discrimination must be taking place. However, its virtually impossible to identify the appropriate demographic profile, so crude proxy measurements are used instead. These proxies, again because such information is difficult to come by, fail to take into account relevant factors (differential employee preferences being a big one), and are thus of questionable value at best2. In cases involving wage and salary differentials the evidence often takes the form of competing statistical regressions, raising the issues of which variables are relevant and which direction (forward or reverse) the regression should go. The latter item is especially pertinent in that the regression may boost one party's case when run in one direction but the opposing party's case when run in the other. And then, to further muddy the waters, there are additional complications regarding the interpretation of unexplained variation and the specific form(s) the regression equation(s) should take3. There seems to be no principled way to resolve such disagreements, which calls the utility of the entire process into question.

More generally, as Epstein notes in his discussion of the Sears case4, there has been a gradual shift away from disparate treatment lawsuits in favor of disparate impact lawsuits, driven largely by the fact that overt acts of discrimination have become less prevalent as time goes by. The premise supporting this shift is that, rather than engaging in overt acts, employers now engage in subconscious, subtle, and subterranean forms of discrimination. As Alice Kessler-Harris put it in her Sears testimony

[F]ailure to find women in so-called nontraditional jobs can thus only be interpreted as a consequence of employers' unexamined attitudes or preferences, which phenomenon is the essence of discrimination.5

The question that skeptics need to ask themselves is "Is this theory falsifiable?". Does the failure to identify discriminatory practices indicate that there are none, or have we merely failed to dig deep enough to find them?

As an interesting aside, I ran across a variation on this theme a couple of days ago while reading about American Atheists' slavery billboard. Matt Dillahunty of The Atheist Experience wrote the following while expressing his thoughts on the subject:

Perhaps Sikivu is right. Clearly there were some who were more than a little upset about the imagery. I’m not even sure I can comment on that, as my own privilege might be preventing a clear understanding. At the end of the day, though, if the billboard isn’t conveying the intended message and achieving its intended goal, then it failed and we need some other message.

Same problem here... is the concept of "X privilege" falsifiable? Matt, who's not a stupid fellow, can't tell if he's blinded by privilege or not. How are skeptics to evaluate whether such criticisms are valid or whether they're merely being used as a pretext to foreclose discussion6?

Apart from the methodological problems associated with demonstrating the presence or absence of discrimination there is also the more fundamental question of whether all employment discrimination is morally objectionable. The skeptical case against employment discrimination and/or in favor of Title VII, very lightly sketched, is something along the lines of

  1. Public policy should encourage rational behavior.
  2. It is irrational to base employment decisions irrelevant personal characteristics.
  3. The protected classifications identified in Title VII embody irrelevant characteristics and thus should not be considered when making employment decisions.
  4. Yay Title VII!

The above is in no way intended to be an airtight and/or definitive statement of principle, but I feel it provides a reasonable approximation for present purposes. Allow me, then, to offer a few observations:

The fact that a practice is irrational does not imply that it needs to be subject to legal regulation. Skeptics as a whole tend to take a dim view of religion, for example, regarding the vast majority of religious practices as wholly irrational. Many have gone so far as to maintain that the religious indoctrination of minors is a form of child abuse. In my mind there's a solid case to be made that being subjected to religious indoctrination is far worse than being denied a job because of race or sex, since the former can fuck you up for life while the latter generally represents a temporary setback. Skeptics do not seem to be clamoring for the regulation of religious life, which would imply, if we're being consistent, that the lesser evil of employment discrimination does not merit regulation either.

Title VII notwithstanding, personal characteristics which may be irrelevant in one context may be highly relevant in another. This is self-evidently true in the case disability status, but is just as true for other characteristics as well. Epstein provides an interesting example of this in his discussion of collective choice, this gist of which is that workforce homogeneity and/or employee self-segregation may improve employee happiness while simultaneously lowering employer costs. This is another area where a skeptical mindset may come into direct conflict with a commitment to progressive values. Progressives tend to treat employer costs as irrelevant and self-segregation as a symptom of false consciousness, assertions to which skeptics should respond "Why?" and "Prove it", respectively.

Finally, Title VII has been interpreted to require behavior which is inconsistent at best and manifestly irrational at worst. For example, in the Seattle area we have The Facts, a newspaper which caters to African-Americans and has a largely African-American staff. I'm glad that such an institution exists and think that its a fine example of market specialization, but the 80% test test tells us that they're most certainly engaging in racial discrimination in hiring. Despite this glaring violation of Title VII there's a snowball's chance in hell that the EEOC will ever seek a prosecution against them by virtue of the fact that they're a minority-owned business. It's hard to square this outcome with Title VII's premise that discrimination in hiring is illegal without resorting to special pleading. And then there are situations, such as the Daniel Lamp Company case, where Title VII leads to perverse outcomes that are of no conceivable benefit to anyone.

In closing I should add that I'm not particularly interested in picking on Title VII, it just happens to be a prominent example where the interests of skeptics and progressives tend to diverge. There are plenty of other areas (vaccination, alternative medicine, nuclear energy, etc.) where an appropriately skeptical mindset may come into conflict with general trends in progressive thought. More than anything else I believe that skeptics need to be consistent in the application of our skepticism and require the same standards of proof for progressive beliefs that we expect from everyone else.


1 And, strangely, doesn't mention Bakke even in passing. Perhaps it was excluded on the grounds that it's not an employment-related case.
2 Forbidden Grounds, pp. 368 - 375
3 Ibid., pp. 375 - 385.
4 Ibid., pp. 385 - 390.
5 Ibid., p 387, fn. 31.
6 Not sure what a general solution to this might be, but in the context of the billboard perhaps we can turn to Daniel Fincke's work on moral offense.

Wednesday, March 14, 2012

Yeah, the Droid 3 is great, but...

I just switched to the Droid 3 from the Pre 2 (forced on me, don't ask) and the experience has been nothing short of revolutionary. It's really nice to have a device that doesn't take 3 seconds to recognize that you've pressed a button. That said, the one thing I am missing about the Pre/WebOS at this point is its multi-tasking interface. It was tremendously useful (and intuitive I might add) to be able to background an application with a single gesture and then launch another or foreground an application which had already been launched. I'm slowly getting used to Droid's take on multitasking but it feels clunky in comparison.

I'm coming to think that WebOS is, all things considered, a pretty good platform whose adoption was hampered by shitty hardware. I mean, if HP had put out a phone using Motorola's HW platform instead of the Pre (powered by Hamster Inside©) it almost certainly would have done a lot better.

Blog Information Profile for gg00