Stability And Predictive And Incremental Accuracy Of The Individual Items Of Static-99r And Static-2002r In Predicting Sexual Recidivism

AuthorLeslie-Maaike Helmus,David Thornton
Published date01 September 2015
Date01 September 2015
DOIhttp://doi.org/10.1177/0093854814568891
Subject MatterArticles
/tmp/tmp-17ikuvabdarzQg/input 568891CJBXXX10.1177/0093854814568891Criminal Justice And Behaviorhelmus, thornton / Static-99r and Static-2002r itemS
research-article2015
Stability and Predictive and
incremental accuracy Of the
individual itemS Of Static-99r and
Static-2002r in Predicting Sexual
recidiviSm

a meta-analysis
LESLIE-MAAIkE HELMUS
Forensic Assessment Group
DAVID THORNTON
Research Unit, Sand Ridge Secure Treatment Centre
This study investigated a potential source of variability in actuarial scale performance: the individual items. Using data
from 8,053 sex offenders across 22 samples, we examined the predictive and incremental accuracy of the items from
Static-99R and Static-2002R, and the stability of predictive accuracy across samples. Generally, all items had significant
predictive accuracy and contributed incrementally to predicting sexual recidivism, with few exceptions. Roughly half the
items demonstrated significant variability in their predictive accuracy across samples, although this was often variability
in the magnitude of predictiveness as opposed to the direction. Some moderator effects were found, with the most com-
mon being the country of the study (which influenced accuracy in different directions depending on the item) and
whether the offenders were preselected as unusually high risk or need (lower discrimination was found in these samples).
The findings support the Static-99R and Static-2002R items with few exceptions, and possibilities for future research are
highlighted.
Keywords: risk assessment; sex offenders; Static-99R; Static-2002R; meta-analysis; predictive accuracy
authOrS’ nOte: The views expressed are those of the authors and not necessarily those of the Wisconsin
Department of Health Services. We would like to thank the following researchers for granting us permission to
use their data: Alfred Allan, Howard Barbaree, Tony Beech, Susanne Bengtson, Jacques Bigras, Sasha Boer,
Jim Bonta, Sébastien Brouillette-Alarie, Vivienne de Vogel, Margretta Dwyer, Reinhard Eher, Doug Epperson,
Andy Haag, R. Karl Hanson, Leigh Harkins, Andreas Hill, Steve Johansen, Ray Knight, Niklas Långström,
Calvin Langton, John Milton, Terry Nicholaichuk, Jean Proulx, Martin Rettenberger, Rebecca Swinburne
Romine, Daryl Ternowski, Robin Wilson, and Annie Yessine. Correspondence concerning this article should be
addressed to Leslie-Maaike Helmus, Forensic Assessment Group, 11 Aspen Grove, Nepean, ON K2H 8Z9,
Canada; e-mail: Lmaaikehelmus@gmail.com.

CRIMINAL JUSTICE AND BEHAVIOR, 2015, Vol. 42, No. 9, September 2015, 917 –937.
DOI: 10.1177/0093854814568891
© 2015 International Association for Correctional and Forensic Psychology
917

918 CRIMINAL JUSTICE AND BEHAVIOR
Professionals seeking to evaluate the risk presented by sexual offenders commonly use
empirical actuarial instruments to anchor their assessments. These instruments use
items selected on the basis of their observed statistical association with sexual recidivism.
Individual items typically have only a small association with recidivism but by combining
them into prediction scales, a moderate level of prediction can be obtained. When they were
first introduced, the appeal of these scales probably seemed to be that they divided offend-
ers into groups, each of which had its own distinct sexual recidivism rate.
Static-99 (Hanson & Thornton, 2000) is one of the scales that benefited from this percep-
tion of actuarial tools and which accordingly has become widely used for pretreatment
assessments (Jackson & Hess, 2007; McGrath, Cumming, Burchard, Zeoli, & Ellerby,
2010), community supervision (Interstate Commission for Adult Offender Supervision,
2007), and civil commitment evaluations (Jackson & Hess, 2007). It is also the most exten-
sively studied of the empirical actuarial instruments used to assess risk of sexual recidivism
(Hanson & Morton-Bourgon, 2009). Its ability to sort offenders into groups that differ in
their relative risk of sexual recidivism based on relatively simple, easy-to-obtain informa-
tion has been widely replicated (Hanson, Babchishin, Helmus, & Thornton, 2013; Hanson
& Morton-Bourgon, 2009) and also has been demonstrated to be fairly stable across diverse
samples (Helmus, Hanson, Thornton, Babchishin, & Harris, 2012), but this close scrutiny
has also revealed important limitations of the method.
To begin with, all empirically derived recidivism estimates are based on limited samples
and so these estimates are subject to sampling error. Of course it is possible to calculate
confidence intervals for these estimates. Examples of these can be seen on www.static99.
org, but they are sometimes misunderstood as implying that the risk of the individual
offender can be expected to fall within these bounds. This is not the case; these confidence
intervals relate only to how well-estimated the average recidivism rate of persons having a
given score is. They tell us nothing about how homogeneous, in terms of risk, persons hav-
ing a given Static-99 score are. Possible sources of heterogeneity could include factors
external to the risk scale, as well as differences in how offenders obtained a given score (i.e.,
offenders can obtain the same Static-99 score based on a completely different set of risk
items).
We know that there are important factors external to Static-99 that influence the risk of
recidivism. One example is age. Although the optimal way of measuring age (e.g., first
arrest, index offense, release) and the reasons for its relationship to recidivism are debated
(e.g., Barbaree & Blanchard, 2008; Rice & Harris, 2014), early research was clear that age
at release added incremental predictive accuracy to the original version of Static-99
(Barbaree, Langton, & Blanchard, 2007; Barbaree, Langton, Blanchard, & Cantor, 2009;
Hanson, 2006; Thornton, 2006; for a notable recent exception to this finding, see
Rettenberger, Haubner-MacLean, & Eher, 2013). These findings led to the development of
a revised version of the scale (Static-99R; Helmus, Thornton, Hanson, & Babchishin,
2012).1
In addition, fairly comprehensive assessment of the kinds of psychological factors identi-
fied by Mann, Hanson, and Thornton (2010) have consistently shown incremental predic-
tion beyond that possible from static scales (Allan, Grace, Rutherford, & Hudson, 2007;
Beggs & Grace, 2010; Craig, Thornton, Beech, & Browne, 2007; Eher, Matthes, Schilling,
Haubner-Maclean, & Rettenberger, 2012; Hanson, Helmus, & Harris, in press; Harkins,
Thornton, & Beech, 2009; Olver, Wong, Nicholaichuk, & Gordon, 2007; Thornton, 2002;

Helmus, Thornton / STATIC-99R AND STATIC-2002R ITEMS 919
Thornton & knight, 2013). This means that variation in the density of psychological risk
factors among offenders with the same Static-99R score will lead to corresponding varia-
tion in these offenders’ actual risk (e.g., Hanson & Thornton, 2012). This represents a source
of variation in risk that is not taken into account in the way confidence intervals are
calculated.
A particularly striking finding has been the significant variation between samples in sex-
ual recidivism rates associated with Static-99R scores. For example, the predicted recidi-
vism rate for a Static-99R score of 2 is as low as 3% in some samples and as high as 20%
in others (Helmus, Hanson, et al., 2012). This has thoroughly undermined the notion of a
single recidivism rate being associated with each actuarial risk score and has led to the
development of multiple sets of recidivism norms being offered on www.static99.org, with
evaluators having to determine which is the most suitable to the individual case.
The present research is seeking to explore another potential source of variation in the
utility of Static-99R and Static-2002R: the contribution of individual items. It is notable that
most of the static risk factors included in the scales have not been systematically examined
since Hanson and Bussière’s (1998) meta-analysis, and the specific definitions and coding
rules of the Static-99R and Static-2002R items have not yet been meta-analyzed. Validation
of these individual items in more current samples, corresponding to the specific definitions
of the Static-99R and Static-2002R coding rules, would be helpful. To the extent that certain
items are not predicting recidivism, this would reduce confidence in the results of the scale.
Furthermore, even if all items predict recidivism, it is possible that they do not add uniquely
to the prediction of recidivism after controlling for the other items.
Beyond these basic questions, we hypothesize that there may be meaningful variation
between samples in how predictive these items are. If this is the case, it would have impor-
tant implications for Static-99R and Static-2002R, as well as other static empirical actuarial
scales, which often use similar items. There are at least seven kinds of reason for anticipat-
ing variability.
First, variation in how well the outcome variable is measured should affect the associa-
tion of all predictors with recidivism. There are a number of ways in which this can occur:
Use of a fixed follow-up period rather than a ragged follow-up should lead to a more valid
outcome measure (although analyses such as Cox regression can alleviate some of the dis-
advantages of varying follow-up), use of multiple sources may identify recidivism events
that have otherwise been missed, and prosecutorial discretion and variability across juris-
dictions in charging practices will...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT