The dimension of the Supreme Court.

AuthorEdelman, Paul H.

It is a rare occurrence when the New York Times, (1) Washington Post, (2) NPR, (3) and even Jack Kilpatrick (4) discuss a political science paper. Nonetheless, that is what happened after A Pattern of Analysis of the Second Rehnquist US Supreme Court, by Lawrence Sirovich, (5) was published in the Proceedings of the National Academy of Science in June 2003. Sirovich's paper applies two unusual mathematical techniques to the decisions of the Court with the aim of "extracting key patterns and latent information." (6) Using information theory, and in particular the idea of entropy, Sirovich claims that the "Court acts as if composed of 4.68 ideal Justices." (7) After applying a singular value decomposition to the decision data he concludes that the Court's decisions can be accurately approximated by a suitably chosen two-dimensional space.

While some commentary has questioned whether Sirovich's conclusions are novel, at least one of the methods of analysis is new (in the context of political science) and might also prove useful in other circumstances. Moreover the methods themselves raise interesting questions about the Court. It is therefore worthwhile to consider the methods more carefully.

Before discussing the methods themselves, we need to explore how Sirovich encodes data from the Court. He starts by ordering the Justices in alphabetical order (although any order would work) and then encodes each decision by a vector with nine entries in which a 1 signifies a Justice who was in the majority and a -1 signifies a Justice in the minority. For example, a case decided unanimously is coded (1,1,1,1,1,1,1,1,1) and a case decided by the classic 5-4 conservative-liberal split (say Garrett (8)) is coded (-1,-1,1,1,1,1,-1,-1,1) where the first -1 indicates that Breyer (the alphabetically first Justice) was in the minority and the last 1 indicates that Thomas (the alphabetically last Justice) was in the majority. (9) Thus, Sirovich reduces each case to a string of 1 and -1's of length 9. I will refer to these codings as vote-patterns.

There are two things worth noting about Sirovich's data set. First, he records the decisions of the Court and not the opinions. For instance, Lawrence v. Texas (10) is recorded as (1,1,1,1,-1,-1,1,-1) with O'Connor listed in the majority even though she did not join the majority opinion. The second fact worth noting is that Sirovich discarded 30% of the cases because "the vote was incomplete or ambiguous (per curiam ... decisions furnished no details of the vote and were deemed inadmissible, as were cases in which a Justice was absent or voted differently on the parts of a case)." (11) Later I will reexamine his decision to exclude these cases.

ENTROPY

The most original part of Sirovich's paper is his use of information theory, and in particular the idea of entropy, to analyze the Supreme Court. Sirovich uses information theory to measure the variability of the set of vote patterns of the Court. While others have discussed the distribution of decisions from the Court, (12) and the correlation of votes among the Justices (13) no one has proposed an overall measure of the variability of decisions until now. This fact alone makes Sirovich's paper worth reading.

Entropy is a measure of the total amount of variability in a situation. Suppose there are n different possible outcomes which we list as 1, 2,..., n and that outcome j occurs with probability [p.sub.j] The entropy of this set of outcomes is defined to be

I = - [n.summation over j = 1] [p.sub.j] log [p.sub.j]

where the logarithm is taken to be base 2. (14) Entropy measured in these terms can be interpreted as the smallest average code word needed to convey the outcomes. (15) It is infeasible to provide a complete explanation here, but a few examples should suffice to explain how it works.

First, a small example to help clarify the ideas. Suppose that when you talk to your stockbroker he recommends Buy with probability 1/2, Hold with probability 1/4 and Sell with probability 1/4. What is the entropy of his recommendations? Applying the formula above yields

I = -1/2 x log 1/2 - 1/4 x log 1/4 - 1/4 x log 1/4 = - 1/2 x (-1) - 1/4

(-2) - 1/4 x (-2) = 3/2.

Remember that the log function here is base 2, so log ([2.sup.n]) = n. What is the significance of the value 3/2? Suppose your stockbroker had to communicate by Morse Code with you and he wanted to use, on average, as short a set of symbols as possible. If he encoded Buy with a dot, Sell with a dash-dot and Hold with a dash-dash, the expected length of his recommendation signal would be 3/2. This is because Buy is coded with a single symbol and occurs half the time, and Hold and Sell each are coded with two symbols and occur a quarter of the time. The expected length therefore is 1/2 x 1 + 1/4 x 2 + 1/4 x 2 = 3/2. One can show that this is the best possible way to encode this information. (16)

Let us examine Sirovich's examples. Sirovich calls an "Omniscient Court" one for which every decision is unanimous. For such a court, there is only one outcome, which occurs with probability 1. By the above formula, I=0. This accords with our intuition since we assume that all the opinions will come down the same way and hence we get no new information from seeing one.

It is worth noting, though, that unanimity has nothing to do with this analysis. What matters is that every opinion is the same. If every case were to be decided by the canonical conservative versus liberal 5-4 margin, the entropy would still be 0. Omniscience, in and of itself, has nothing to do with the amount of entropy.

The other extreme case, as proposed by Sirovich, is the "platonic" Court. In this instance, he assumes that each Justice is equally likely to vote for one side as the other, which is to say "the vote of a platonic justice is as predictable as the toss of a fair coin." (17) In this situation there are [2.sup.8]=256 different possible outcomes and all of them are equally likely. To see this, note that the total number of strings of length nine where each entry is a 1 or -1 is [2.sup.9] = 512, because there are two possibilities for each coordinate. By the way we have encoded the decisions, we only consider those with more 1's than -1's (since a 1 indicates that Justice is in the majority) and exactly half of the 512 strings have that property. Applying the entropy formula leads to

I = [256.summation over j=1] 1/256 log 1/256 = - log 1/256 = - log [2.sup.-8] =8

for the entropy calculation. The significance of the value 8 is that, if we were clever, we could convey this information using only strings whose average length is 8 instead of 9. (18)

How, then, does Sirovich compute the entropy of the Court? Using data from October Term 1994 through October Term 2001, he computes the probability of a given majority coalition by counting the number of times it occurred and dividing by the total number of cases. Then he computes entropy using the above formula to get a value of 3.68. He concludes that the Court behaves like 4.68 "platonic" Justices. The equivalent number of "platonic" Justices is one more than the entropy because a Court with no entropy still has a judge to cast the unique decision. Alternatively, as noted earlier, (19) since the judgments are encoded so that there is always a larger number of 1's than -1's, there is always an extra degree of freedom in the number of judges over...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT