Developing High‐Quality Data Infrastructure for Legal Analytics: Introducing the Israeli Supreme Court Database

DOIhttp://doi.org/10.1111/jels.12250
AuthorLee Epstein,Keren Weinshall
Date01 June 2020
Published date01 June 2020
Journal of Empirical Legal Studies
Volume 17, Issue 2, 416–434, June 2020
Developing High-Quality Data
Infrastructure for Legal Analytics:
Introducing the Israeli Supreme
Court Database
Keren Weinshall*and Lee Epstein
Driving discovery in the study of law and legal institutions often requires infrastructure in the
form of databases and other tools. The challenge is how to build the infrastructure. For obvi-
ous reasons, transplanting coding rules and variables from one dataset to the next is perilous;
specialized knowledge of local conditions is necessary before one piece of datum is collected.
Also required is adherence to a universal set of principles that distinguish high-quality infra-
structure; namely, that the tool is capable of addressing real-world problems, accessible, repro-
ducible and reliable, sustainable and updatable, and foundational. These principles guided
construction of the Israeli Supreme Court Database, new and original infrastructure encoding
information from all panel cases opened between 2010 and 2018 in the Israeli Supreme Court.
I. Introduction
The past decade has witnessed dramatic growth in the empirical analysis of apex courts
worldwide, from Argentina (Muro et al. 2018) and Brazil (Arguelhes & Hartmann 2017) up
to Canada (Alarie & Green 2017) and from Taiwan (Chen et al. 2015) across the globe to
Israel (Weinshall-Margel 2016) and most of Europe (Hanretty forthcoming).
1
This is wel-
come news because studies of judicial behavior add to the store of knowledge on law and
legal institutions, provide guidance to policymakers, educate the public about their courts,
help lawyers develop strategies, and even prompt judges to rethink their choices (Posner
2008; Epstein et al. 2013; Wistrich et al. 2015). The less-welcome news is that data
*Address correspondence to Keren Weinshall, Vice Dean, Katia & Hans Guth-Dreyfus Chair in Conflict Resolution
and Law at The Hebrew Universityof Jerusalem, Mt. Scopus, Jerusalem 9190501 Israel;email: keren.weinshall@mail.
huji.ac.il. Epsteinis Ethan A. H. Shepley Distinguished UniversityProfessor at Washington University in St. Louis.
For helpful comments, we thank participants at conferences at ETH Zurich, the European University Institute
in Florence, Bar-Ilan University, and the Hebrew University. We also thank the I-CORE program of the Planning
and Budget Committee and the Israeli Science Foundation (Grant 1821/12) for their generous support of the
Israeli Supreme Court Database (http://ISCD.huji.ac.il).
1
The cites in parentheses are examples; for each high court many more papers and books have been published.
The Israeli Supreme Court alone has generated scores of studies (e.g., Sommer 2009; Eisenberg et al. 2010, 2012,
2013; Weinshall-Margel 2011; Dotan 2013; Gliksberg 2014; Rosenthal et al. 2018; Anidjar et al. forthcoming).
416
infrastructure designed to advance knowledge and drive innovation, discovery, and invention
for the analysis of judges and their courts has not kept pace with the accelerating interest.
2
Why? One answer is that the field has mostly eschewed high-quality infrastructure in
the form of public multi-user databases designed to capture a range of foundational infor-
mation in favor of hand-coded datasets aimed at answering particular research questions
(see generally Epstein et al. forthcoming).
3
The “one-off” approach has its benefits; chiefly,
the resulting dataset is precisely tailored to the researchers’ theoretical framing, definitions,
and hypotheses. However, it also has substantial costs. Because encoding characteristics of
courts and judges can be expensive, many tailored datasets consist of a small number of
observations, decreasing statistical power and negating the combinatorial advantage. For
the same reason, they are rarely updated, limiting their capacity to address contemporary
problems. Finally, even when scholars include the same cases and covariates in their stud-
ies, conflicting results can, and do, emerge because of different data-collection procedures
and practices.
4
Taken collectively, these costs impede the drive to discovery.
Law and courts scholars worldwide acknowledge the problems of weak data
infrastructure—not to mention the challenges in making headway (Honnige & Gschwend
2010; Kapiszewski & Ingram forthcoming). One is the lack of consensus in the commu-
nity over what form the data should take and for what purposes infrastructure should be
developed. Some scholars favor quantitative (numerical) data and a selection process that
allows for statistical inference; others are more interested in non-numerical data that they
can interpret, organize into categories, and use to identify patterns. Much hand-wringing
also ensues over how to define and measure concepts of interest (e.g., judge ideology,
judicial independence, case subject matter).
Frankly, these divisions should not obstruct forward movement. Important break-
throughs have and will continue to follow from data infrastructure that relies on random-
ness or intention to select observations, that encodes data with numbers or archives text,
or that permits for causal inference or deep description. Data are data, methods are
methods (Patty 2015). As long as infrastructure can advance knowledge and accelerate
discovery, these are differences without meaning.
2
Sometimes, infrastructure is a method, procedure, or application that makesour work easier, faster, and better. No
doubt such advancesin law and the social sciences have beenmade, but apps and the like are not foremost on the
minds of most scholars in the field; their core concern ratherlies with products designed to capture data generated
by courts, judges,lawyers, and other legaland political actors. For this reasonwe use the term “data infrastructure.”
3
Existing multi-user databases relevant to the study of judicial behavior include the Biographical Directory of
U.S. Federal Judges (Federal Judicial Center 2020), the Comparative Constitutions Project (Elkins et al. 2020), the
German Federal Courts Dataset (see Hamann 2019), the Norwegian Supreme Court Database (see Grendstad
et al. 2015), the U.S. Supreme Court Database (Spaeth et al. 2020), the U.S. Supreme Court Justices Database
(Epstein et al. 2020), and V-Dem (2020). The European Court of Human Rights Database (Cichowski & Chrun
2014), the International Criminal Tribunals Database (Meernik 2014), the National High Courts Database (Haynie
et al. 2003), and U.S. Courts of Appeals Database (Songer 1996; Kuersten & Haire 2002) are also public multi-user
databases but have not been updated for at least five years. See also note 5.
4
To provide a simple example: Shamir (1990) and Dotan (1999) identify conflicting trends in the win rate of Pal-
estiniansat the Israeli Supreme Courtbecause of the researchers’ different definitions (and coding) of litigant success.
Weinshall and Epstein 417

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT