Accuracy and Fairness for Juvenile Justice Risk Assessments

Date01 March 2019
Published date01 March 2019
AuthorRichard Berk
DOIhttp://doi.org/10.1111/jels.12206
Journal of Empirical Legal Studies
Volume 16, Issue 1, 175–194, March 2019
Accuracy and Fairness for Juvenile Justice
Risk Assessments
Richard Berk*
Risk assessment algorithms used in criminal justice settings are often said to introduce
“bias.” But such charges can conflate an algorithm’s performance with bias in the data
used to train the algorithm with bias in the actions undertaken with an algorithm’s output.
In this article, algorithms themselves are the focus. Tradeoffs between different kinds of
fairness and between fairness and accuracy are illustrated using an algorithmic application
to juvenile justice data. Given potential bias in training data, can risk assessment algo-
rithms improve fairness and, if so, with what consequences for accuracy? Although statisti-
cians and computer scientists can document the tradeoffs, they cannot provide technical
solutions that satisfy all fairness and accuracy objectives. In the end, it falls to stakeholders
to do the required balancing using legal and legislative procedures, just as it always has.
I. Introduction
The recent introduction of “big data” and machine learning into the operations of crimi-
nal justice institutions has met with a mixed reception. For some, the promise of deci-
sions both smarter and more fair has led to qualified support (Ridgeway 2013a, 2013b;
Brennan & Oliver 2013; Doleac & Stevenson 2016; Ferguson 2017). “Smart policing” is
one instance. For others, the risks of inaccuracy and racial bias dominate any likely bene-
fits (Harcourt 2007; Starr 2014; O’Neil 2016; Angwin et al. 2016). Racial bias inherent in
the data used by criminal justice agencies is carried along and amplified by machine-
learning algorithms; bias in, bias out.
Computer scientists and statisticians have responded with efforts to make algorith-
mic output more accurate and more fair, despite the prospect of flawed data. Better tech-
nology is the answer. But is it? Perhaps even the best algorithms will be overmatched.
Perhaps better technology can go only so far. In the end, perhaps the most challenging
issues will need to be resolved in the political arena. These are the matters addressed in
this article.
Much past work on algorithmic bias makes algorithms the fall guy. But training
data can really matter. Moreover, common applications of criminal justice algorithms
*Department of Criminology, University of Pennsylvania, Philadelphia, PA; email: berkr@sas.upenn.edu.
175
provide information to human decisionmakers. It is important to distinguish between
that information, human decisions informed by the information, and subsequent actions
taken (Kleinberg et al. 2017; Stevensen 2017). Concerns about algorithmic bias properly
are raised only about an algorithm’s internal machinery.
There is a formal literature on algorithmic accuracy and fairness that can be
directly consulted, and good summaries of that literature exist (Berk et al. in press). How-
ever, the expositions can be quite technical and integrating themes are too often lost in
mathematical detail. A different expositional strategy is offered here. The issues will be
addressed through empirical examples from a dataset rich in accuracy and fairness chal-
lenges: predictions of recidivism for juvenile offenders. The real-world setting will make
the technical content more grounded and accessible. Credible unifying conclusions can
then be more easily drawn.
1
Four broad points will be made. First, there are many kinds of unfairness so that
“bias” can materialize along several dimensions. There can be, for example, inequality of
treatment, inequality of opportunity, and inequality of outcome. Second, there will be
tradeoffs between different kinds of unfairness, with some irreconcilable in most real
applications. Third, there will be tradeoffs between accuracy and fairness. If an algorithm
is designed to be optimally accurate, anything that introduces additional objectives can
lead to reduced accuracy. Finally, it is the job of statisticians and computer scientists to
document the various tradeoffs in an accessible manner. But the balancing required to
address the tradeoffs is not a technical matter. How the tradeoffs will be made is a matter
of competing values that will need to be resolved by legal and political action.
II. Background
Literature reviews of juvenile criminal justice risk assessments reveal that risk assessments
for juveniles and risk assessments for adults raise most of the same issues (Pew Center on
the States 2011; Vincent et al. 2012; National Institute of Justice & Office of Juvenile Jus-
tice and Delinquency Prevention 2014; Office of Juvenile Research and Delinquency Pre-
vention 2015). There are concerns about accuracy (Meyers & Schmidt 2008; Oliver &
Stockdale 2012) and concerns about fairness (Huizinga et al. 2007; Schwalbe 2008;
Thompson & McGrath 2012). Differences center on the kinds of predictors used and,
arguably, a greater emphasis on determining needs and treatment modalities for juve-
niles.
2
The discussion to follow centers on the themes of accuracy and fairness.
There is also a small but growing literature addressing more directly the ethnical
and legal issues (Hyatt et al. 2011; Tonry 2014; Ferguson 2015; Hamilton 2016; Barocas &
1
The data were collected as part of a demonstration of concept. The demonstration went well, but was also an
implicit criticism of existing practices. One of the key agencies involved withdrew its support, claiming it was satis-
fied current procedures.
2
The recent development for adults of “Generation 4” risk assessments may right the balance (Desmarais &
Singh 2013).
176 Berk

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT