AN APPRECIATION OF RICHARD CRASWELL INTRODUCTION I. THE ISLAND OF EDEN II. BAYES FOR AN ERA OF BIG DATA III. COMPARATIVE STATICS A. Trawling a Larger Database B. Tom Versus Tom, Dick, and Harry IV. APPLICATION TO PEOPLE V. COLLINS V. APPLICATION TO PEOPLE V. PUCKETT A. Minimal Priors B. A Model for Calculating Priors C. Application to Puckett VI. EMPIRICIZING THE ALIBI AND PRIOR PROBABILITIES A. Large Database Trawls 1. Empiricizing the alibi probability 2. Empiricizing the prior probability B. Small Database Trawls VII. ADMISSIBILITY CONCLUSION AN APPRECIATION OF RICHARD CRASWELL
It is entirely fitting for an Article on the legal implications of information economics to honor Dick Craswell. (1) Large swaths of Dick's writings are careful workings-out of the ways that imperfect information can impact private behavior and constrain judicial decisionmaking. (2) Dick's analysis of how consumers might mistakenly update their prior beliefs after a corrective advertising order (3) is a close analogy to our claim that unguided juries are likely to mistakenly update in response to incomplete statistical DNA evidence.
With the recent Supreme Court decision allowing the collection of DNA samples from any person arrested and detained for a serious offense, (4) it seems inevitable that the justice system will collect and use large DNA databases. Currently, DNA databases are widely used. As of April 2015, the Combined DNA Index System (CODIS) maintained by the Federal Bureau of Investigation (FBI) had more than "283,440 hits [and had assisted] in more than 270,211 investigations." (5) There is concern that as database size increases, so too will the rate of false positives, and thus innocent people will be convicted when their DNA matches evidence left at a crime scene. (6) This concern has led courts to a convoluted and misguided use of multiple lenses to evaluate DNA evidence.
In this Article, we argue that there is a single right answer for how to incorporate the use of DNA evidence. That answer is the application of Bayes' rule, a 250-year-old formula for updating a starting probability estimate for a hypothesis given additional evidence. (7) Applying Bayes' rule, we argue that triers of fact evaluating DNA evidence should be presented with what we call the "source probability": the probability that a defendant whose DNA matches the DNA found at the crime scene was the true source of that evidence. As we discuss below, the source probability is not the same as the chance of a random DNA match and does not equal the probability of guilt; even if the defendant was the source of the forensic DNA, the defendant might not have committed the crime. (8)
Our primary contribution will be to show that the source probability may turn crucially on the size of two variables that have not been introduced (or relied upon by experts) in DNA matching cases: (i) the initial or prior probability that the source of the DNA is included in the database, and (ii) the relevant or adjusted size of the DNA database, a calculation that takes into account the demographic information known about the criminal and the probability that a nonsource in the DNA database would have an alibi.
Experts have shied away from helping jurors form baseline beliefs, which are more formally called prior probabilities, and from then helping them convert those priors into a conclusion. The problem is that, absent priors, it is not clear how to coherently employ the expert information. As we discuss in our analysis of People v. Puckett, (9) an expert might well conclude that certain evidence makes it 100 times more likely that the suspect was at the scene of the crime. But 100 times more likely than what? The starting point or prior for a suspect who is identified from a large database trawl might well be less than 1 in 1000. In that case, 100-to-1 evidence is not persuasive. If the suspect was related to the victim and had motive and opportunity, then 100-to-l would be much more convincing.
We will argue that there are practical means of estimating the prior probabilities and the relevant database size and that, as a legal matter, these parameters as well as the final source probability are admissible. In particular, changing the focus from a question about the prior probability that the defendant was the source to the prior probability that the "database is guilty"--that is, the probability that someone in the database is the source of the forensic evidence--not only is analytically and empirically more tractable, but also avoids the evidentiary limitations concerning a particular defendant's prior bad acts.
In People v. Johnson, a California Court of Appeal panel, in reviewing different types of DNA statistics, emphasized that '"the database is not on trial. Only the defendant is.' Thus, the question of how probable it is that the defendant, not the database, is the source of the crime scene DNA remains relevant." (10) But to apply Bayes' rule, the probability that the database contains the source of the forensic DNA, assessed prior to any consideration of whether an individual in the database actually matches, becomes a crucial input in determining the (posterior) likelihood that a particular matching defendant is the source of the forensic DNA. Contrary to Johnson, assessing the prior probability that the database includes the source--colloquially, "the probability that the database is guilty"--provides at once a readier means of estimation and a stronger argument for admissibility.
At the end of the day, we will acquit or convict a defendant, not a database. The problem is that it is very hard to directly estimate a starting point or prior probability for the likelihood that a specific defendant committed a crime. For example, what is the chance that some "John Doe" committed a crime before we have any evidence about Mr. Doe? In contrast, it is more coherent to ask the chance that a class of individuals, for example, convicted felons, would include the perpetrator of a crime. (11) For example, if half of rapes are committed by convicted felons, then the starting point would be fifty percent, assuming that the database contains all convicted felons. If jurors are to properly understand the implications of finding a match from a large database trawl, the size and characteristics of that database are relevant information.
Some legal analysts have been dismayed by the ways in which evidence of a DNA match tends to eclipse any role for adversarial engagement--turning litigants into little more than potted plants. (12) But appropriate application of Bayes' rule, far from preempting the factfinding process and the adversarial process, can guide advocates to engage with the important aspects of the evidence that are still likely to be open to contestation. We will show how estimation of both the prior probability and relevant database size can be assessed under alternative assumptions that are appropriately open to literal and figurative cross-examination to assure the robustness of the bottom-line conclusion: the defendant was or was not the true source of the crime scene evidence.
For more than forty years, scholars have been debating the appropriate use of probabilities and Bayesian inference in the courtroom. (13) Among the criticisms leveled at Bayesian reasoning is that jurors will be unable to integrate probabilistic and nonprobabilistic evidence, that the occurrence or nonoccurrence of a past event does not admit to intermediate probabilities, and that Bayesian inference is incompatible with our presumption of innocence or proof beyond reasonable doubt values. (14) Instead of engaging on the question of whether probabilistic evidence can or should elicit valid inferences of defendant guilt, we instead focus on a practicable way to present powerful evidence of the posterior probability that the defendant was the source of forensic evidence. (15)
In Part I, we discuss a motivating example that illuminates three conflicting approaches to statistical inference. Part II then lays out our notation and model applying Bayes' rule and explains why the two variables that we emphasize need to be accounted for. Part III analyzes the comparative statics of our model--how the source probability is affected by changes in five underlying parameters. Parts IV and V apply our Bayesian model to the cases of People v. Collins (16) and People v. Puckett, respectively. Part V explains how the underlying parameters in our model can be empirically estimated. Part VI discusses whether our approach is compatible with current rules of evidence.
THE ISLAND OF EDEN
Imagine that a singular crime has been committed on the otherwise idyllic island of Eden. (17) This island has a population of 51,295, and no one has come or gone since the crime. Thus, we know with certainty that the criminal is one of these 51,295 individuals. Moreover, given the nature of the crime, it is not possible to rule out anyone on the island as potential suspects. But there is one clue: the criminal has left behind a trace of DNA at the scene. Based on the distribution of DNA, there is a one-in-a-million chance that a random individual in the Eden population would match the DNA. (18) The people of Eden are upset about this crime and they agree that each individual will be required to provide a DNA sample to be tested. The elders of Eden are able to collect samples from 51,294 individuals, all but Mr. Baker. Unfortunately, Mr. Baker was killed in a fishing accident. This accident occurred after the crime but prior to the decision to collect DNA. Mr. Baker was given the traditional burial at sea and therefore his corpse is not available, and there are no personal items from which a DNA sample could be retrieved. There is no evidence to suggest that the tragic accident was in any way related to the crime or the subsequent investigation.
After collecting all of the DNA samples, it turns...