The Problem of Data Bias in the Pool of Published U.S. Appellate Court Opinions

AuthorKeith Carlson,Michael A. Livermore,Daniel N. Rockmore
Published date01 June 2020
Date01 June 2020
DOIhttp://doi.org/10.1111/jels.12253
Journal of Empirical Legal Studies
Volume 17, Issue 2, 224–261, June 2020
The Problem of Data Bias in the Pool
of Published U.S. Appellate Court
Opinions
Keith Carlson, Michael A. Livermore*, and Daniel N. Rockmore
For decades, researchers have studied the relationship between the political leanings of
judges and the outcomes of appellate litigation in the United States. The primary source
of data for this research has been published judicial opinions that describe cases and their
outcomes. However, only a relatively small number of cases result in published opinions,
and this sample of cases may be subject to serious biases. Based on computational text
analysis of over 150,000 published opinions issued by federal appellate courts in the years
1970–2010, we find strong evidence of data bias based on relationships between the party
affiliations of judges on appellate court panels and the characteristics of cases that result
in published opinions. These relationships imply that the inferential model that underlies
much of the judicial politics literature can lead to biased or spurious findings concerning
the causal influence of judicial attributes on case outcomes.
I. Introduction
Much research on appellate court decision making is based on information contained in
the published opinions that are issued by these courts (Kaheny et al. 2008; Kastellec
2013). However, only a small minority of matters decided by appellate courts lead to a
published opinion, with the vast majority resulting in unpublished dispositions. There
*Address correspondence to Michael A. Livermore, University of Virginia, 580 Massie Rd., Charlottesville, VA
22903; email: mlivermore@virginia.edu. Carlson is PhD candidate (computer science) at Dartmouth College; Liver-
more is Professor of Law at the University of Virginia; Rockmore is William H. Neukom 1964 Professor of Compu-
tational Science and Professor of Mathematics and Computer Science, Dartmouth College, and Member of the
External Faculty of the Santa Fe Institute.
Our thanks to Brandon Stewart for extremely helpful guidance on the model. Thanks to Kevin Cope, Marion
Dumas, Joshua Fischman, Jens Frankenreiter, Michael Gilbert, Alex Jakubow, Jonathan Nash, David W. Rohde, and
anonymous reviewers for comments. Thanks also to comments from participants at the 2020 Text Analysis in Law
Conference at U.C. Berkeley School of Law, the 2019 Text Analysis and Law Conference at Northwestern University,
the 2018 Online Workshop on Computational Analysis of Law, and the 2016 Conference on Empirical Legal Stud-
ies. Replication files, including data and code, can be found at: math.dartmouth.edu/~jelsdatabiasuscourts.
©2020 The Authors. Journal of Empirical Legal Studies published by Cornell Law School and Wiley Periodicals LLC.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use,
distribution and reproduction in any medium, provided the original work is properly cited.
224
are two inferential problems raised by the selection of cases for publication. First, some
types of cases may be more likely to lead to a published opinion than others, so it is not
obvious that conclusions drawn only from the body of published opinions apply to the
broader universe of case dispositions (Edwards & Livermore 2009; Songer & Davis 1990).
Second, and more important, the judicial attributes of interest—such as partisan
affiliation—could affect whether a published opinion is issued for a given case. Consider-
ation of published opinions alone could therefore lead researchers to incorrectly attri-
bute correlations between judicial attributes and case outcomes to a direct causal
influence when, in reality, publication behavior accounts for the observed relationship.
Scholars have been aware of this problem for some time and have discussed its
potential implications for the literature (Atkins 1992; Edwards & Livermore 2009;
Fischman 2015; Keele et al. 2009; Ringquist & Emmert 1999; Songer & Davis 1990). The
standard solution is to work with more comprehensive datasets that include unpublished
opinions (e.g., Fischman 2015; Peresie 2005). This approach has limitations, however.
Data collection is more difficult for unpublished opinions and much of the existing work
in the field is based on published opinions alone. In an analysis of panel effects that
included the reexamination of several of the most prominent prior studies on the sub-
ject, Fischman (2015) reports that nine of the 14 studies excluded unpublished decisions.
High-profile work in the past decade based on published opinions alone includes
Kastellec (2013) and Boyd et al. (2010). The most commonly used academic datasets in
the field are also based on only published opinions (Kuersten & Songer 2003; Sunstein
et al. 2006). Furthermore, even the publicly (or commercially) available datasets that do
include some unpublished opinions are under-inclusive. For example, recent work has
found that only a fraction of immigration cases decided by federal appellate courts are
included in even the unpublished materials in commercial databases (Kagan et al. 2018).
The classic judicial politics analysis examines the relationship between judicial attri-
butes, such as the party affiliation of a judge, and the outcomes of adjudications, fre-
quently understood in terms of the ideological valence of a decision (Cross 2007). Causal
inferences from observed associations between judicial attributes and outcomes rely on
two assumptions: random (or as-if random) assignment and representative selection. If
the random assignment assumption is violated, then it would be impossible to know
whether correlations between judicial attributes and outcomes were due to different
judges being presented with different types of cases. If the representative selection
assumption is violated, then judge-outcome correlations in the observed data may be due
to selection (via publication choices) rather than a causal influence of judicial attributes
on outcomes.
In this article, we devise and apply three new empirical tests for what we refer to as
“publication effects” that can be applied without the acquisition of data on unpublished
opinions. Our tests for publication effects compare patterns in the features of published
opinions to the patterns (or lack thereof) that would be expected in a data-generating
process with random assignment in which judicial attributes did not affect the decisions
to publish. We apply these tests to a corpus of over 150,000 published opinions issued by
three-judge panels of the U.S. appellate courts from 1970 to 2010. All three of our tests
are based on relationships between the attributes of a judge and other variables
Data Bias in Published U.S. Appellate Court Opinions 225
(including the attributes of other judges). For our purposes, the attribute of interest is
the party of the president who appointed the judge, which is often used in the judicial
politics literature as a proxy for judicial ideology.
1
Chilton and Levy (2015) and Levy (2017) find that the random assignment assump-
tion is violated, at least in certain circumstances. Without extensive data on actual assign-
ment practices, we know of no way to untangle publication effects from nonrandom
assignment, and the tests described below could capture both publication effects and
nonrandom assignment. Based on the well-established fact that only a relatively small per-
centage of cases are published, and on the modest scale of nonrandom assignment found
in research to date, we maintain the convention of assuming random assignment and
describe our results accordingly, but this interpretation should be understood as includ-
ing an important caveat that nonrandom assignment could drive some of the results we
describe below. Both publication effects and nonrandom assignment are equally prob-
lematic for the standard inferential chain in the judicial politics literature.
The first of our tests examines patterns in the panel types that are reported in pub-
lished opinions. With two parties and three panelists, there are four possible panel types:
all Republican (RRR); one Democrat (RRD); one Republican (DDR); and all Democrat
(DDD). Based on data in published opinions for each circuit-year, we construct a null
model in which panel types are randomly generated. We then estimate the likelihood
that the observed distribution of panel types with respect to the party affiliation of judges
is consistent with this null model. We find that, across the corpus, the observed distribu-
tion of panel types is inconsistent with the null model in a majority of circuit-years. These
effects are heterogeneous, but there are aggregate effects as well, with the observed distri-
butions departing from the null for most years and circuits. There are even corpus-wide
effects: a larger number of published opinions feature single-party panels than expected.
Our second analysis relies on information encoded in the names of the parties to a
case—in particular, whether the United States is a party. Assuming random assignment
and no publication effects, judicial attributes should be uncorrelated with the parties to a
case, once accounting for the year and circuit. We first examine heterogeneous effects
through a simulation exercise that permutes case captions at the circuit-year level, and
through a year-based analysis at the aggregate level. In both analyses, we find correlations
that are inconsistent with a process of publication that is random with respect to the rele-
vant case characteristics and judicial attributes. We also estimate corpus-wide relation-
ships through a fixed effects model that finds general correlations between panel
composition and the United States as a party. These analyses offer evidence that publica-
tion effects exist that are conditioned on the interaction of judicial attributes and case
characteristics.
1
For purposes of our study, we make no assumptions about the link between party and ideology; rather, we use
party of the appointing president as a proxy for an unknown set of judicial attributes, which might include ideol-
ogy among others. Our study design does not require that the party of the appointing president track any particu-
lar judicial attribute because any of the relationships that we identify would not exist without some influence from
party-related judicial attributes on either publication or panel assignment.
226 Carlson et al.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT