The rule of probabilities: a practical approach for applying Bayes' rule to the analysis of DNA evidence.

AuthorAyres, Ian
PositionV. Application to People v. Puckett through Conclusion, with footnotes, p. 1476-1503 - Symposium: Festschrift in Honor of Richard Craswell

    On February 21, 2008, John Puckett was found guilty of first-degree murder for the 1972 death of Diana Sylvester. (70) Sylvester was a twenty-two-year-old nurse working at the University of California, San Francisco. (71) She was stabbed and strangled in her apartment. The case remained unsolved for a number of years. (73) Then, in 2003, as part of a cold-case investigation, Bonnie Cheng performed a DNA analysis for the San Francisco Police Department (SFPD) using sperm samples obtained during the autopsy. (74) Due to the degradation of the samples, she was able to obtain useful data at five and a half loci. (75)

    The DNA information was run through the California DNA database of convicted felons, which contained 338,711 profiles, and there was one match, Puckett. (76) He was in the database because of a conviction for three separate crimes in 1977, in which he had abducted and raped two young women and sexually assaulted a third. (77)

    The police located the then-seventy-two-year-old Puckett in a trailer park in Stockton, California. (78) They ascertained that he had been living in San Francisco in 1972 and subsequently arrested him. (79)

    Puckett was convicted of first-degree murder and sentenced to life in prison. (80) The primary evidence in the case was his DNA match. (81) As reported by San Francisco Magazine contributing writer Chris Smith, who covered the trial, the other evidence was far from conclusive and might not even be considered incriminating. (82) Smith describes the ambiguous evidentiary record:

    [Lead homicide investigator] Toomey leans in and says, almost gently, "We have a DNA match, and it comes back to you." Puckett's reply: "I ... I ... don't remember this at all." So, is that a murderer's half-assed denial? Or the genuine protestation of a scared old man? It's like that all the way down the line: Puckett either matches the eyewitness description or doesn't. (Was he "medium-build with curly hair," as the description puts it, or "heavyset and balding," as photos from that time show him?) (83) The jury also heard of Puckett's three prior rape and assault convictions. These prior convictions were admitted, notwithstanding the limitation on admitting prior bad acts, as the prosecutor argued that there was a common pattern in Puckett's crimes. (84)

    Thus, the jury was left with the following picture: There is a convicted triple rapist whose DNA matches that of the semen left at the crime. The chance a DNA match would happen by chance is below one in a million. (85) The jury was not told how the match was found, the size of the database, or any other information that would help them form a more accurate estimate of Puckett's likelihood of guilt. Absent a rock-solid alibi, it seems predictable that Puckett was found guilty, although it took the jury three days of deliberation to render the verdict. (86) That was the final ruling. An appeal to the California Court of Appeal ended when Puckett died on July 21, 2010. (87)

    Our central claim is that the statistical evidence presented-chiefly the random match probability-was incomplete. Without putting the statistical evidence into context, the jury did not have enough information to reach a conclusion one way or the other. Some jurors, when told about statistics that were withheld, said that they would have reached a different verdict. (88)

    In this Part, we show how our earlier analysis of the posterior source probability should have guided the way that the DNA evidence was presented to the jury. In our analysis, we begin by using the odds ratios and then show how these odds estimates can be converted into an estimate that the defendant was the source of the forensic DNA.

    In the present case, there was strong circumstantial evidence that the source of the forensic evidence was guilty of rape and murder. Prosecutor David Merin made this connection in simple and vivid terms: "His DNA was found in her mouth." (89) Thus, in this case, source and guilt probabilities are the same. Puckett did not claim that he had consensual sex with Sylvester or that the DNA was that of a sibling. (90) Thus, for this case, we will use the source probability as the probability of guilt in our discussions below.

    To foreshadow the results, we think the evidence correctly presented is still strongly in favor of Puckett's guilt. There are three important caveats to this conclusion. First, some of the analysis we use, especially the formation of prior beliefs, might not be admissible. We return to this important question below in Part VI, where we will argue that, notwithstanding appropriate limitations on introducing evidence of prior bad acts, courts should be able to hear expert testimony about the Bayesian posterior (and the underlying prior on which it is based). Second, we don't have access to the actual DNA database, and thus several of our calculations are estimates. We try to be conservative in our estimates. But the data do exist, and thus our calculations suggest a method that can be improved upon. The next Part will make explicit how this can be done more generally. Finally, we discuss the role of the first suspect in the case, Robert Baker, and how that might influence our view of Puckett's potential guilt.

    Separate from our reflections on Puckett's guilt, this analysis will suggest a new approach to deal with hits reached from a large database trawl. As a starting point, we believe that the jury should not have been told only that the chance of a random U.S. Caucasian matching at the five and a half loci was one in 1.1 million. They should also have been told that the DNA match was found as a result of a database trawl of 338,711 felons. (91) In terms of our earlier notation, the jurors should have been presented with D in addition to r.

    But that does not mean that the chance of a random match, the "random match probability," was one in three. First, 338,711/1,100,000 = 0.31 is the expected number of random matches (what we call rD in our earlier analysis). It is not the probability of at least one match if everyone in the database is randomly selected from the population of innocent persons. The actual probability is:

    1 - [(1,099,999 / 1,100,000).sup.338.711] = [26.5%.sup.92]

    While 26.5% is smaller than 33%, this is a minor point, and not one that changes any of our conclusions. But we want to start with the correct calculations.

    The reason to change conclusions is that 26.5% grossly overstates the chance of a random match when excludable matches are removed from the database. Since an eyewitness identified the criminal as a male Caucasian, had a female or a non-Caucasian matched, we would then know this is an innocent match. Thus, only the male Caucasians in the database are relevant.

    According to the State's brief in Puckett's appeal, 28.4% of inmates are Caucasian. (93) If this percentage was constant over time, then only 96,194 felons in the database are relevant. This lowers the chance of a random match to 8.4%. We should also eliminate women from the database. We don't have access to the data, but in 1990, 14% of felony defendants were female according to a report by the Bureau of Justice Statistics. (94)

    Furthermore, we believe that the criminal was at least age eighteen at the time of the attack, which implies an age of at least fifty at the time of the DNA search. We don't know how many people in the database were older than fifty in 2004. This is a knowable fact, but we do not have access to this information. The State argued that, as of December 31, 2005, just over 5% of inmates were over fifty years old. (95) However, that is not a relevant statistic. We are not concerned with the chance that the guilty party is presently incarcerated. We are concerned with the age distribution of felons in the database, recognizing that DNA could have been entered some ten or twenty years prior. These data do suggest that the fraction over fifty should be at least 5% because the current figure is 5% and data entered from earlier periods would have had a chance to age up. For purposes of illustration, we will assume that the age distribution of Caucasians in the felon database mirrors the overall Caucasian population that is twenty-five and over. (96) From 2000 census data, 42.7% of the Caucasian male population that is at least twenty-five is age fifty or older. (97)

    Assuming the database in Puckett's case was a representative sample of the overall population of felony defendants nationwide, taking those under fifty and women out of the database lowers the relevant database population to 96,194 x 0.427 x 0.86 = 35,325. This estimate deflates the database size to those without alibis -as captured in equation (7) by D(1 - a). This netting out of alibied members of the database reduces the chance of an innocent match to something below 3.2%. (98) In other words, the expected number of matches without an airtight alibi is only one-tenth as many as the defense claimed.

    But, as argued above, we think it is not productive to focus on the chance of a random innocent match. Instead, we should look at the strength of the evidence that follows from a match. The likelihood ratio used to update the prior is the observed number of matches relative to the predicted number of innocent matches, M: E[M]. In this case, with 35,325 viable felons in the database, the predicted number of male matches who are Caucasian and older than fifty is 35,325/1,100,000 = 0.032. Thus, it might seem that the likelihood of observing the match in favor of Puckett being the source compared to an innocent nonsource match is 1:0.032 or 31:1.

    But even this calculation significantly understates the likelihood of Puckett being the source of the forensic evidence. The calculations above did not consider other potential airtight alibis, such as proof that the person was in another state or was incarcerated at the time the crime was committed. Since this was a...

To continue reading

Request your trial