Effects of Automating Recidivism Risk Assessment on Reliability, Predictive Validity, and Return on Investment (ROI)

DOIhttp://doi.org/10.1111/1745-9133.12270
Published date01 February 2017
Date01 February 2017
RESEARCH ARTICLE
RECIDIVISM RISK ASSESSMENT
Effects of Automating Recidivism Risk
Assessment on Reliability, Predictive Validity,
and Return on Investment (ROI)
Grant Duwe
Minnesota Department of Corrections
Michael Rocque
Bates College
Research Summary
The relationship between reliability and validity is an important but often overlooked
topic of research on risk assessment tools in the criminal justice system. By using
data from the Minnesota Screening Tool Assessing Recidivism Risk (MnSTARR),
a risk assessment instrument the Minnesota Department of Corrections (MnDOC)
developed and began using in 2013, we evaluated the impact of inter-rater reliability
(IRR) on predictive performance (validity) among offenders released in 2014. After
comparing the reliability of a manual scoring process with an automated one, we
found the MnSTARR was scored with a high degree of consistency by MnDOC staff
as intraclass correlation (ICC) values ranged from 0.81 to 0.94. But despite this level
of IRR, we still observed a degradation in predictive validity given that automated
assessments significantly outperformed those that had been scored manually. Additional
analyses revealed that the more inter-rater disagreement increased, the more predictive
performance decreased. The results from our cost–benefit analyses, which examined the
anticipated impact of the MnDOC’s efforts to automate the MnSTARR, showed that
for every dollar to be spent on automation, the estimated return will be at least $4.35
within the first year and as much as $21.74 after the fifth year.
Direct correspondence to Grant Duwe, Minnesota Department of Corrections, 1450 Energy Park Drive, Suite
200, St. Paul, MN 55108 (e-mail: grant.duwe@state.mn.us).
DOI:10.1111/1745-9133.12270 C2017 American Society of Criminology 235
Criminology & Public Policy rVolume 16 rIssue 1
Research Article Recidivism Risk Assessment
Policy Implications
Although it is unclear the degreeto which our findings, which are somewhat preliminary,
are generalizable to other offender populations and correctional systems, we believe the
results aresufficiently promising to warrant greater interest in automating the assessment
of risk and need. Weanticipate many, if not most, correctional systems may need to invest
in upgrading their IT infrastructure to support the use of automated instruments. But
we also anticipate this investment would deliver a favorablereturn for our results suggest
that automation reduces inter-rater disagreement, which in turn improves predictive
performance. Even if automation did not improve performance, the increased efficiency
it produces would create reinvestment opportunities within correctional systems.
Keywords
risk assessment, recidivism, reliability, predictive validity, prison
In corrections, one primary task for practitioners is to minimize the probability that
released offenders commit new offenses. With the rise of the risk–needs–responsivity
(RNR) approach (Andrews and Bonta, 2010), agencies are increasingly using risk
assessment instruments to ensure their resources are directed at the offenders who pose
the highest threat to society. In the last several decades, there has been a move from
more qualitative, clinical judgment to standardized risk assessment tools, with a subsequent
increase in validity or the prediction of recidivism with each iteration (Andrews, Bonta,
and Wormith, 2006). Currently, at least 19 risk assessment tools are used by correctional
agencies in the United States as identified by the results of researchin the scholarly literature
(Desmarais and Singh, 2013), and even more if one includes noncorrectional populations
(Singh and Fazel, 2010). Because the methods of scoring and the instruments themselves
vary widely, research is needed on the approaches that lead to the most reliable and valid
outcomes.
For the most part, the aim of research that exists on actuarial risk assessment tools
has generally been focused on validity or on the extent to which the tools predict later
criminal behavior (Desmarais and Singh, 2013; Lowenkamp, Holsinger, Brusman-Lovins,
and Latessa, 2004). The results of much of this work have suggested that structured risk
assessment tools do in fact distinguish between those who are likely to reoffend and those
who are not (Duwe, 2012; Hanson and Morton-Bourgon, 2009; Lowenkamp et al., 2004;
Schwalbe, 2007).
Reliability, or consistency of assessments between raters or between assessments over
time, is often thought to be important, even within the context of recidivism risk assessment.
As Austin (2006) stated, the accuracy of a risk assessment tool depends on both validity and
reliability. The two properties areinter twined; an instrument that is entirely unreliable will,
by necessity, have lower validity (Jackson, 2012). But few studies have been conducted in
which inter-rater reliability (IRR), the most relevant form of reliability among commonly
236 Criminology & Public Policy
Duwe and Rocque
used risk assessment tools, has been evaluated (Baird, 2009; Desmarais and Singh, 2013;
Lowenkamp et al., 2004; van der Knaap, Leenarts, Born, and Oosterveld, 2012). For
example, in their review, Desmarais and Singh (2013) found that less than 4% of the
studies they identified with the aim of evaluating risk assessment tools examined IRR.
None of the existing research on recidivism risk assessment has, to our knowledge, had
as its focus looking at the relationship between reliability and validity. It is commonly
claimed that an instrument can be reliable but not valid, but to be valid, it must be reliable
(Latessa and Lovins, 2010). Nevertheless, this is generally as far as the discussion goes.
It seems reasonable to assume that predictive validity is affected by reliability, but it is
unclear to what extent. Schene and colleagues (2000: s16) argued that in certain cases,
the reliability of an instrument in part affects the possible validity score the instrument
can attain because “unreliability masks the true relationship between the constructs under
study.”
In this study, we examine how reliability, measured via inter-rater reliability, is related
to validity. To do so, we use data on offenders released from Minnesota prisons who had
been assessed with the Minnesota Screening Tool Assessing Recidivism Risk (MnSTARR),
a risk assessment instrument manually scored by prison caseworkers that the Minnesota
Department of Corrections (MnDOC) began using in 2013 (Duwe, 2014). In November
2016, the MnDOC implemented the MnSTARR 2.0, a fully automated instrument that
is not scored by caseworkers. In doing so, Minnesota has become one of the first states to
automate in full the assessment of recidivism risk for its prisoner population.1
Weassess IRR by comparing the manually scored MnSTARR data for offenders released
from prison during 2014 with data created by an automated process. After evaluating the
predictive performance of the MnSTARR data scored by manual and automated processes,
we examine the relationship between reliability and validity. We also estimate the cost-
effectiveness of an automated risk assessment system by comparing the MnSTARR 2.0
implementation costs with the monetized benefits—namely, correctional staff time saved.
In the following sections, we review the literature on risk assessment instruments, introduce
the MnSTARR, describe the data and analyses, and present the results of our study.
Reliability and Validity of Risk Assessment Instruments
Reliability
Reliability is one of the two psychometric properties often used by researchers to assess the
effectiveness and accuracy of measurement (Carmines and Zeller, 1979). Reliability refers
to consistency in terms of how well items in an instrument correlate with one another,
1. We do not consider Web-based tools that are scored manually, whether through an offender interview
and/or database review, to be automated simply because they are Web based. Given the results
presented later, we anticipate that, all else being equal, a manually scored tool, even if Web based,
would be less reliable, valid, and efficient than would be an instrument that comprises an automated
scoring process.
Volume 16 rIssue 1 237

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT