Hidden Bias in Empirical Textualism

Hidden Bias in Empirical Textualism
MATTHEW JENNEJOHN,* SAMUEL NELSON** & D. CAROLINA NÚ~
nEZ***
A new interpretive technique called “corpus linguistics” has exploded
in use over the past f‌ive years from state supreme courts and federal
courts of appeals to the U.S. Supreme Court. Corpus linguistics involves
searching a large database, or corpus, of text to identify patterns in the
way in which a certain term is used in context. Proponents of the method
argue that it is a more “empirical” approach than referencing diction-
aries to determine a word’s public meaning, which is a touchstone in
originalist approaches to legal interpretation.
This Article identif‌ies an important concern about the use of corpus
linguistics in legal interpretation that courts and scholarship have over-
looked: bias. Using new machine learning techniques that analyze bias
in text, this Article provides empirical evidence that the thousands of
documents in the Corpus of Historical American English (COHA), the
leading corpus currently used in judicial opinions, ref‌lect gender
bias. Courts and scholars have not considered that the COHA is sex-
ist, raising the possibility that corpus linguistics methods could serve
as a vehicle for infecting judicial opinions with longstanding prejudi-
ces in U.S. society.
In addition to raising this important new problem, this Article charts a
course for dealing with it. It explains how hidden biases can be made
transparent and introduces steps for “debiasing” corpora used in legal
interpretation. More broadly, it shows how the methods introduced here
can be used to study biases in all areas of the law, raising the prospect of
a revolution in our understanding of how discriminatory biases affect
legal decisionmaking.
TABLE OF CONTENTS
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
I. EMPIRICAL TEXTUALISM AND CORPUS LINGUISTICS . . . . . . . . . . . . . . . . . . 773
A. INTRODUCTION TO CORPUS LINGUISTICS . . . . . . . . . . . . . . . . . . . . . . . 775
* Professor of Law at BYU Law School. © 2021, Matthew Jennejohn, Samuel Nelson & D. Carolina
Nú~
nez.
** Research Fellow at BYU Law School.
*** Associate Dean of Faculty and Curriculum and Professor of Law at BYU Law School.
Authors are listed in alphabetical order. Many thanks to Natalie Packard, Jaimee Crossley, and Ryan
Wallentine for excellent research assistance. Finally, many thanks to Brad Bernthal, Sadie Blanchard,
Jesse Egbert, Jennifer Fan, Alexis Hoag, Cathy Hwang, Tom Lee, Da Lin, Jeremy McClane, Yaron Nili,
Darren Rosenblum, Paolo Saguato, Greg Shill, Andrew Winden, and David Wingate for thoughtful
comments and discussions regarding earlier drafts. All errors remain the authors’.
767
B. USING CORPUS LINGUISTICS IN LEGAL INTERPRETATION. . . . . . . . . . . 777
1. Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
2. Quantif‌iability and Verif‌iability. . . . . . . . . . . . . . . . . . . . 780
C. CORPUS LINGUISTICS IN THE COURTS . . . . . . . . . . . . . . . . . . . . . . . . . 780
II. THE LIMITS OF CORPUS LINGUISTICS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
A. A BRIEF CASE STUDY OF CORPUS LINGUISTICS IN STATUTORY
INTERPRETATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
B. PRIOR WORK ON THE LIMITS OF CORPUS LINGUISTICS IN LEGAL
INTERPRETATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
1. The Illusion of Accuracy and Objectivity. . . . . . . . . . . . . 784
2. Concerns About Corpora. . . . . . . . . . . . . . . . . . . . . . . . . 785
3. Judicial Competence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786
III. MEASURING HIDDEN BIAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
A. DATA .................................................. 788
B. HYPOTHESES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
C. METHODS............................................... 789
D. RESULTS................................................ 794
1. Both Occupational Words and Adjectives in the COHA
Ref‌lect Gender Bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
2. The Intensity of Gender Bias Changes over Time. . . . . . . 795
3. Both Male and Female Authors Write Biased Text, but
Female Authors Less So . . . . . . . . . . . . . . . . . . . . . . . . . 800
E. SUMMARY .............................................. 802
IV. NEXT STEPS FOR EMPIRICAL APPROACHES TO LEGAL INTERPRETATION. . . 802
A. MAKING HIDDEN BIAS TRANSPARENT . . . . . . . . . . . . . . . . . . . . . . . . . 804
B. DEBIASING LEGAL CORPUS LINGUISTICS . . . . . . . . . . . . . . . . . . . . . . . 805
1. Identifying Potential Instances of Bias in Corpus-Based
Legal Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
2. Corpus Construction and Methodology . . . . . . . . . . . . . . 806
768 THE GEORGETOWN LAW JOURNAL [Vol. 109:767
3. Judging and Mitigating Bias . . . . . . . . . . . . . . . . . . . . . . 807
C. NEW QUESTIONS THAT WE SHOULD BE ANSWERING . . . . . . . . . . . . . . 808
CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
INTRODUCTION
A new method for the interpretation of legal texts, such as constitutions, stat-
utes, and regulations, is spreading through the U.S. judiciary.
1
Proponents of the
method, known as “corpus linguistics,” argue that it is a more reliable, empirical
way of discerning the public meaning of a word than resorting to dictionaries,
legislative history, or judges’ intuitions.
2
Many federal and state courts have
applied the method, including Justice Thomas of the U.S. Supreme Court.
3
In essence, the corpus linguistics method allows a user to search a large body
of text, or corpus, for a particular word to identify patterns in usage that reveal in-
formation about a word’s meaning. For example, a user may track a word’s fre-
quency over time, identify the words that most frequently occur in close vicinity
to that search term, or review each instance of a word’s usage in context.
4
1. See infra note 3 and accompanying text.
2. See, e.g., Neal Goldfarb, A Lawyer’s Introduction to Meaning in the Framework of Corpus
Linguistics, 2017 BYU L. REV. 1359, 1367–68; Thomas R. Lee & James C. Phillips, Data-Driven
Originalism, 167 U. PA. L. REV. 261, 289–90, 292 (2019); Thomas R. Lee & Stephen C. Mouritsen,
Judging Ordinary Meaning, 127 YALE L.J. 788, 829 (2018); Stephen C. Mouritsen, Hard Cases and
Hard Data: Assessing Corpus Linguistics as an Empirical Path to Plain Meaning, 13 COLUM. SCI. &
TECH. L. REV. 156, 202 (2011).
3. See, e.g., Carpenter v. United States, 138 S. Ct. 2206, 2238 & n.4 (2018) (Thomas, J., dissenting)
(using corpus linguistics as evidence that “search” was not associated with “reasonable expectation of
privacy”); Wilson v. Safelite Grp., Inc., 930 F.3d 429, 442 (6th Cir. 2019) (Thapar, J., concurring)
(describing corpus linguistics as “an important tool . . . in f‌iguring out the meaning of a term”); People v.
Harris, 885 N.W.2d 832, 839 (Mich. 2016) (using corpus linguistics to show that “information,” used
alone, may refer to both true and false information); Fire Ins. Exch. v. Oltmanns, 2018 UT 10, 57 n.9,
416 P.3d 1148, 1163 n.9 (justifying the use of corpus linguistics due to the weaknesses of human
intuition); Neese v. Utah Bd. of Pardons & Parole, 2017 UT 89, 99, 416 P.3d 663, 691 (describing
corpus linguistics’ tendency to focus on semantics and pragmatics); Craig v. Provo City, 2016 UT 40,
26 n.3, 389 P.3d 423, 428 n.3 (commending Provo City for brief‌ing on corpus linguistics); State v. J.M.
S., 2011 UT 75, ¶¶ 38–40, 280 P.3d 410, 418–19 (Lee, J., concurring) (using corpus linguistics to show
that “abortion procedure” refers to a medical procedure); J.M.W., III v. T.I.Z., 2011 UT 38, 89 & n.21,
266 P.3d 702, 724 & n.21 (Lee, J., concurring in part and concurring in the judgment) (using corpus
linguistics to show that “custody” is more closely associated with “divorce” than “adoption”); Muddy
Boys, Inc. v. Dep’t of Commerce, 2019 UT App 33, ¶¶ 25–26, 440 P.3d 741, 748–49 (stating that
corpus linguistics requires large databases rather than small samplings); O’Hearon v. Hansen, 2017 UT
App 214, 25 n.8, 409 P.3d 85, 93 n.8 (citing State v. Rasabout, 2015 UT 72, 57, 356 P.3d 1258, 1275
(Lee, Associate C.J., concurring in part and concurring in the judgment)); see also Rasabout, 2015 UT
72, ¶¶ 12–13, 17, 356 P.3d at 1263–64 (using the dictionary to interpret the term “discharge,” and
noting Associate Chief Justice Lee’s suggestion to use corpus linguistics instead); id. 36, 356 P.3d at
1269 (Durrant, C.J., concurring in part and concurring in the judgment) (explaining that corpus
linguistics could be useful in interpreting “the ordinary meaning of statutory terms”); id. ¶¶ 57–65, 356
P.3d at 1275–77 (Lee, Associate C.J., concurring in part and concurring in the judgment) (proposing the
use of corpus linguistics as an additional tool for statutory interpretation).
4. See infra Section I.A.
2021] HIDDEN BIAS IN EMPIRICAL TEXTUALISM 769

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT