The First Amendment Case for Public Access to Secret Algorithms Used in Criminal Trials

Publication year2018

The First Amendment Case for Public Access to Secret Algorithms Used In Criminal Trials

Vera Eidelman
American Civil Liberties Union, vera.eidelman@gmail.com

[Page 915]

THE FIRST AMENDMENT CASE FOR PUBLIC ACCESS TO SECRET ALGORITHMS USED IN CRIMINAL TRIALS


Vera Eidelman*


Introduction

Last year, a New York court convicted a man named Mayer Herskovic of gang assault and sentenced him to four years in prison.1 A few years earlier, across the country, a California court found another man named Billy Ray Johnson guilty of twenty-four crimes, including multiple counts of rape; the court sentenced Mr. Johnson to life in prison without parole and 300 years to life, plus 123 years.2 The two cases have little in common—except that both men were convicted on the basis of a new and largely untested method of processing tiny bits of DNA. In Mr. Johnson's case, that was notwithstanding the fact that a witness to one of the alleged crimes reported that the perpetrator was a "light-skinned Hispanic with green eyes," and Mr. Johnson is Black with brown eyes.3 In both cases, the prosecution relied on DNA statistics generated by proprietary probabilistic genotyping programs—computerized

[Page 916]

algorithms used to identify a suspect from a tiny, degraded DNA sample swimming in a soup of many individuals' DNA. Both cases are now on appeal, in part based on concerns about those stastics and their underlying algorithms.4

In today's world, computerized algorithms impact our lives in crucial ways. Such algorithms can decide whether we get a job interview,5 go to a particular college,6 access credit,7 and receive insurance.8 They can also inform what news we see9 and what beliefs we hold.10 And, as shown by the examples above, it is not only private actors who are using computerized algorithms. Increasingly, the government is too.

In fact, the government now relies on algorithms to make profound decisions about our lives, including what level of health benefits we receive,11 whether we can work for the government,12 what risk we

[Page 917]

pose as parents,13 whether or not we get charged with a crime,14 and how we should be treated if we do get charged with a crime.15 Although the government creates and maintains some of these algorithms, many are built by private actors who have a business interest in keeping them secret from competitors.16 And it is now increasingly common for courts to allow the owners of proprietary algorithms who cry "trade secret!" to keep the details of the algorithms hidden, both from the public and from private litigants (including accused individuals like Mr. Johnson and Mr. Herskovic).17

[Page 918]

But, as this Article sets forth, once a computerized algorithm is used by the government, constitutional rights may attach.18 And, at the very least, those rights require that algorithms used by the government as evidence in criminal trials be made available—both to litigants and the public.

Scholars have discussed how the government's refusal to disclose such algorithms runs afoul of defendants' constitutional rights,19 but few have considered the public's interest in these algorithms—or the widespread impact that public disclosure and auditing could have on ensuring their quality.20

This Article aims to add to that discussion by setting forth a theory of the public's First Amendment right of access to algorithms used as evidence in criminal trials. This Article uses probabilistic genotyping programs as an illustrative example, largely because the creators of these algorithms have most aggressively pushed to keep them secret.21 Section I begins by defining the relevant terms, including computerized algorithms, probabilistic genotyping program, machine learning, and source code. Section II describes the roles that humans play in designing, building, operating, and communicating the results of such algorithms—and the variety of errors and mistakes that almost inevitably result. Section III summarizes caselaw articulating the public's First Amendment right of access and suggests how and

[Page 919]

why that right should attach to computerized algorithms used as evidence in criminal trials.

I. Computerized Algorithms Explained

A. Algorithms Broadly

At the most elementary level, an algorithm is a series of steps that transforms inputs into an output.22 It is, essentially, a formula, a manual, a recipe. Something as simple as a blog post explaining how to boil an egg is an algorithm because it directs the transformation of inputs (a saucepan, a stovetop, water, a raw egg, and possibly other inputs) into the desired output (a cooked egg). A probabilistic genotyping program is an example of a more complicated algorithm. It sets forth the steps to transform inputs, described in detail below, into an output: a statistic that establishes the likelihood that a particular suspect is the source of a specific (typically small and often degraded) DNA sample contained in a mixture of multiple peoples' DNA.

Not all algorithms aimed at accomplishing the same goal are identical. Indeed, they often differ in terms of both inputs and steps due to differences in their underlying assumptions. For example, a boiled egg can be made with or without salt or ice water and can be cooked for different amounts of time. Each approach constitutes an egg-making algorithm—but, critically, the quality of the result may differ.

Similarly, the algorithms used to generate a DNA match statistic differ due to differences in underlying assumptions, inputs, and training datasets—and so too must the quality of their output. And, of course, differences in quality of DNA statistics that are introduced at trial to put human beings behind bars or even render them eligible for death are of an entirely different order. Despite that, as discussed further below, the quality of that output is far more difficult to assess than is the quality of a hard boiled egg because of the issue at the

[Page 920]

center of this article: the public lacks access to information about the algorithms.

B. Computerized Algorithms

The phrase computerized algorithms refers to the growing subcategory of algorithms that determine their steps and parameters not only from human assumptions but also machine learning. Machine learning occurs when a computer identifies patterns from a preexisting or training set of data, learns from those patterns, and incorporates the lessons into the algorithm. Probabilistic genotyping programs fall within this subset because they combine human assumptions and machine learning.

As noted above, the desired output for probabilistic genotyping programs is a statistic that expresses the likelihood that a particular suspect is the source of a specific DNA sample—usually a tiny, degraded sample swimming in a larger pool of many individuals' genetic material. These samples can be scraped from, for example, a convenience store counter, a purse strap, a knife handle, or a bike's handlebars.23 This step is done as it always has been: law enforcement collects the sample and then a lab amplifies it for analysis.24 From there, however, probabilistic genotyping diverges from traditional forensic DNA analysis.25

Although traditional DNA analysis looks for a match to a single person's known genetic profile, probabilistic genotyping must first sketch that profile—based on the algorithms' inputs, discussed below—before searching for a match.26 Essentially, using traditional DNA analysis is like looking at a photograph, while using a probabilistic genotyping algorithm is like relying on an investigator's composite sketch.27 Proponents of these programs contend that they

[Page 921]

make it possible to generate matches from precisely the sort of samples that traditional DNA analysis cannot reach, while critics contend that their reliability is uncertain.28

Probabilistic genotyping algorithms typically express their output as a likelihood ratio, a statistic that is computed by dividing (1) the estimated probability that the owner of the DNA in the tested sample has the suspect's DNA profile by (2) the probability that a random person of a particular race or ethnicity has the suspect's DNA profile.29 Or, as one court explained,


the numerator . . . represents the chance that the prosecution hypothesis is true—that a particular individual was one of the contributors to a mixture. The denominator represents the chance that the defense hypothesis is true—that other random individuals, and not the one of interest to the prosecution, were the contributors. Division of the numerator by the denominator produces the likelihood ratio.30

Thus, the goal of probabilistic genotyping programs is the same, but the inputs and precise steps (and therefore, resulting outputs) vary across programs. How they differ is something of a mystery, though, because many are not public. Two of the most popular programs—STRmix, which claims 54% of the U.S. market share,31 and TrueAllele, which had been used in approximately 500 criminal cases by 201632 —are marketed to governments for profit.33 And the companies behind them refuse to disclose the precise components of

[Page 922]

their algorithms, asserting that they are trade secrets.34 At least one such program, Forensic Statistical Tool (FST), was developed by a government actor, which also asserted a private property interest in the algorithm until the program was shelved in 2017.35 At the same time, a variety of less popular probabilistic genotyping programs are available for free and are open source.36

All of these algorithms are "computerized" because the programs' human designers or operators appear to determine and input most of the baseline assumptions, but the programs also learn from existing datasets of DNA markers and populations. Although the precise inputs of many of these programs are not public, many probabilistic genotyping algorithms appear to include a bevy of assumptions: the number of contributors to a particular DNA sample...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT