Addressing the Zeros Problem: Regression Models for Outcomes with a Large Proportion of Zeros, with an Application to Trial Outcomes

Published date01 March 2015
AuthorThomas Eisenberg,Theodore Eisenberg,Martin T. Wells,Min Zhang
DOIhttp://doi.org/10.1111/jels.12068
Date01 March 2015
Addressing the Zeros Problem: Regression
Models for Outcomes with a Large
Proportion of Zeros, with an Application
to Trial Outcomes
Theodore Eisenberg, Thomas Eisenberg, Martin T. Wells, and Min Zhang*
In law-related and other social science contexts, researchers need to account for data with an
excess number of zeros. In addition, dollar damages in legal cases also often are skewed. This
article reviews various strategies for dealing with this data type. Tobit models are often
applied to deal with the excess number of zeros, but these are more appropriate in cases of
true censoring (e.g., when all negative values are recorded as zeros) and less appropriate
when zeros are in fact often observed as the amount awarded. Heckman selection models are
another methodology that is applied in this setting, yet they were developed for potential
outcomes rather than actual ones. Two-part models account for actual outcomes and avoid
the collinearity problems that often attend selection models. A two-part hierarchical model
is developed here that accounts for both the skewed, zero-inflated nature of damages data
and the fact that punitive damage awards may be correlated within case type, jurisdiction, or
time. Inference is conducted using a Markov chain Monte Carlo sampling scheme. Tobit
models, selection models, and two-part models are fit to two punitive damage awards data
sets and the results are compared. We illustrate that the nonsignificance of coefficients in a
selection model can be a consequence of collinearity, whereas that does not occur with
two-part models.
I. Introduction
Many legal system and other social science outcomes raise the question of how to model
phenomena with multiple zeros. Trial outcomes generally result in verdicts for plaintiffs or
defendants. A verdict for a plaintiff in an action involving money damages will lead to a
positive award. A verdict for a defendant will lead to a zero award. For some purposes, the
*Address correspondence to Martin T. Wells, email: mtw1@cornell.edu. Theodore Eisenberg was Henry Allen Mark
Professor, Cornell Law School and Adjunct Professor of Statistical Sciences, Cornell University; Thomas Eisenberg is
a gradate student, Department of Economics, Cornell University; Wells is Charles A. Alexander Professor of Statistical
Sciences, Professor of Biostatistics and Epidemiology, Weill Medical College, and Elected Member of the Law Faculty,
Cornell University; Zhang is Associate Professor Department of Statistics, Purdue University.
Earlier versions of this material were presented at the First Annual Conference on Empirical Legal Studies,
University of Texas and the International Conference on Empirical Legal Studies, conducted by the Cegla Center for
Interdisciplinary Research of the Law, Tel Aviv University, Buchmann Faculty of Law.
bs_bs_banner
Journal of Empirical Legal Studies
Volume 12, Issue 1, 161–186, March 2015
161
mass of zeros represented by defendant victories need not be accounted for. The research
question of interest might be, conditional on plaintiff winning at trial, how much was
recovered? In that case, the zero-award outcomes representing defendant trial wins might
be ignored.
If, however, the researcher wished to include defendant wins in the analysis, ignoring
the zero-award outcomes would not be satisfactory. This might be the case if one were
computing the expected value of a possible lawsuit and wanted to account for both the
probability of winning and the amount of any monetary award if the plaintiff prevailed.
Similarly, if one were interested in the amount of punitive damages awarded to a plaintiff
who won at trial and received a punitive damages award, the cases with punitive damages
awards of zero could be excluded. But if one were interested in the expected punitive
damages recovery in cases won by plaintiffs, it would be necessary to include the cases in
which punitive damages awards were zero in cases won by plaintiffs.
Many empirical legal researchers realize that simple ordinary least squares methods
may be unsatisfactory in the presence of many observations for which the dependent
variable equals zero. Tobit models often are regarded as appropriate when data have a
lower boundary of zero to avoid possibly biased and inconsistent ordinary least squares
estimates (Tobin 1958). A common approach in the presence of many zero values is
therefore to use Tobit models, which account for censoring of the dependent variable (e.g.,
De Ruijter & Braat 2008; Fehr & Gächter 2000; Hersch & Viscusi 2004). A possible problem
is that Tobit models do not account for the skewed, zero-inflated nature of damages data.
Tobit models assume censoring of the dependent variable rather than the dependent
variable often being observably equal to zero. This article reviews techniques for modeling
skewed damages awards in the presence of many zero values. This article develops a
two-part hierarchical model that accounts for the skewed, zero-inflated, and clustered
nature of damages data. It compares the results of the Tobit, selection, and two-part models
for two punitive damages data sets.
The underlying topics and statistical models that are discussed in this article have
appeared in various econometric and statistics literatures. Each of the models we discuss
below makes implicit distributional assumptions that need to be understood and vali-
dated. The goal of this article is to give an overview of the statistical models and issues
that face a legal researcher when analyzing data with many zeros. The rest of the article
is organized as follows. Section II reviews various models for analyzing data with a
large proportion of the continuous outcome variable being zero. Section III applies the
model to real data. We close the article with some discussion in Section IV and conclu-
sions. The Appendix gives Rand OpenBugs code useful for fitting the two-part hierarchi-
cal model.
II. Models for Analyzing a Continuous Outcome
Variable with a Large Proportion of Zeros
In empirical legal literature and other social science literature, there have been misunder-
standings about the proper modeling of a continuous outcome variable with a large
162 Eisenberg et al.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT