Demystifying Hash Searches.

Author:Martin, Dennis

A hash search is a very accurate, computationally efficient technique for testing whether a computer contains illicit material. Although police have been running hash searches for many years, case law is scarce regarding whether and to what extent the Fourth Amendment permits their use. Some commentators have argued that because hash searches reveal information concerning only the presence or absence of contraband, courts shouldn't consider them Fourth Amendment searches. Rather, courts should treat hash searches as a sort of digital dog sniff.

This Note disagrees. It argues first that even accepting the analogy to digital dog sniffs, hash searches nevertheless violate the Fourth Amendment under Florida v. Jardines whenever they are used to look for evidence outside the scope of a search warrant or other permissive mechanism. It then argues that there is no limiting principle that would permit the use of hash searches but not more sophisticated algorithms--algorithms that would constitute the modern equivalents of general warrants. Accordingly, it proposes a rule that covers not only the hash searches that are being used now but also the more sophisticated forensic techniques that will be used in the near future: Police conduct a Fourth Amendment search whenever they use an algorithm to perform a task that would be a search if conducted manually by a human.

Table of Contents Introduction I. Demystifying Hash Searches A. A Simple Hash Search B. How Law Enforcement Uses Hashing 1. Preserving evidence 2. Excluding files known to be noncontraband 3. Searching for evidence II. How Courts Have Handled Hash Searches A. Hash Searches of Publicly Exposed Information B. Hash Searches of Content Shared or Stored Online C. Hash Searches of Private Information by Government Actors 1. United States v. Crist 2. United States v. Comprehensive Drug Testing 3. In re United States's Application for a Search Warrant 4. United States v. Mann 5. United States v. Schlingloff D. Takeaways III. Hash Searches as Digital Dog Sniffs A. Binary Search Doctrine B. Digital Dog Sniffs C. Digital Dog Sniffs Under Florida v. Jardines IV. Hash Searches as a Gateway to General Warrants A. Probabilistic Algorithms B. General Crime Detection Algorithms C. Suspicionless Searches V. Treating Computers as We Treat Humans A. Choosing Between a Proactive and a Reactive Approach B. A Simple Rule for Algorithmic Investigative Techniques Conclusion Introduction

Suppose a government investigator executing a warrant to search your computer for evidence of tax fraud instead clicks through your hard drive file by file looking for pirated music. He's clearly exceeding the scope of his warrant in violation of the Fourth Amendment. (1) Now suppose he writes a computer program to do the exact same thing. Different result?

For many years, government investigators have used digital forensic software to conduct hash searches: a very accurate, very computationally efficient type of search that can be used not just for legitimate purposes but also to identify evidence of crimes outside the scope of a search warrant. (2) Still, many commentators argue that because these hash searches reveal information concerning only the presence or absence of illicit material, the Fourth Amendment does not prohibit their use. (3) They argue that we ought to treat hash searches as a sort of digital dog sniff. (4)

Courts, meanwhile, have been hesitant to apply the Fourth Amendment to algorithmic investigative techniques. Indeed, the Court only recently addressed, in Riley v. California, what limits the Fourth Amendment places on human searches of digital information. (5) By contrast, hash searches involve algorithmic searches of digital information. And hash searches are an appropriate place to begin to assess what sort of limits the Fourth Amendment imposes on algorithmic investigative techniques: Unlike many types of still-developing technological surveillance, hash searches are already being used, and their underlying technology is unlikely to change in the future. (6) And the technology behind hash searches is relatively easy to understand, even for laypeople.

It is increasingly important that courts weigh whether and how the Fourth Amendment governs algorithmic investigative techniques. Although some new technologies, like thermal imaging devices, give police investigative powers they've not previously had, algorithmic investigative techniques typically mimic work that has historically been done by human officers. (7) Indeed, the very purpose of the technologies is to replace, and improve upon, human police work. But the Fourth Amendment should not be read to permit police to use computer programs to conduct investigations that would, if police conducted them manually, be illegal searches. Such a reading would allow law enforcement to shift its investigatory work onto algorithms and away from the Fourth Amendment.

This Note proceeds in five Parts. Part I explains the technology behind hash searches. Prominent commentators have described hashing algorithms as "complex" (8) and "complicated," (9) and some courts have misunderstood how they function. Part I uses some simple examples to show that hash searches are not so arcane. Part II catalogs the various contexts in which courts have addressed hash searches, identifying points upon which courts agree and questions that remain open. Part III considers the argument that hash searches should be analyzed as digital dog sniffs. It argues that even if we accept this analogy, hash searches outside the scope of a warrant are nevertheless illegal searches under Florida v. Jardines. (10) Part IV argues, alternatively, that a reading of the Fourth Amendment permitting hash searches would also permit suspicionless algorithmic searches for ordinary evidence of criminal wrongdoing--twenty-first century general warrants. Finally, Part V argues that in light of these concerns, courts ought to adopt an affirmative framework for assessing their legality rather than a reactive one. This Note proposes such a framework: Police conduct a search when they use an algorithm to perform some task that would be a search if conducted by a human investigator.

  1. Demystifying Hash Searches

    Hash searches, like many concepts in computer science, can seem esoteric. Legal commentators have not helped: They've described hash searches as employing "complex mathematical algorithmic]" (11) or "complicated mathematical operation[s]." (12) Some have suggested that judges are ill equipped to assess the legality of hash searches and other digital forensic techniques, given those techniques' technical complexity. (13) And it is true that some courts have seemed to grasp only imprecisely how hash searches operate. (14) As this Part will show, however, hash searches are conceptually quite simple.

    1. A Simple Hash Search

      Before diving in, we need to distinguish between three different concepts, all of which relate to hashing: (1) a hash function, (2) a hash value, and (3) a hash set. A hash function is a mathematical process that takes some input, like a text file or an image, and outputs a hash value. (15) A hash value is a series of letters and numbers (what some courts have called a "digital fingerprint" (16)) assigned to a particular input. (17) And a hash set is a collection of inputs that are stored according to their hash values. (18) Examples will make these concepts clearer.

      Suppose I write some simple hash function. It takes a string of text as an input and outputs the sum of the ordinal values of the text's constituent letters. If you feed my hash function the input "Ignatius," it outputs (100), which is thus the hash value for "Ignatius."

      I could write a similarly simple hash function for images. All digital images are made up of pixels, which are just tiny points of color situated in a two-dimensional array. (19) Each pixel is a composite of three component colors, red, green, and blue, each of which is assigned a value from 0 to 255. (20) One pixel, then, might be coded as (240, 0, 120); that is, it takes a red value of 240, a green value of 0, and a blue value of 120. And an image on your computer is just an array of these pixels. For example, an image with a resolution of 1280 x 1024 contains 1,310,720 pixels. So we could write a hash function that cycles through an image, pixel-by-pixel, and adds the red, green, and blue values of each pixel to some sum. Once the function reaches the last pixel in the image, it returns that sum, which is now the image's hash value, just like 100 is the hash value for "Ignatius" in Figure 1 above.

      In any hash function, we look for two properties. First, a hash function must be consistent Whenever we pass in a certain input, the function must always return the same output. (21) That is, "Ignatius" must return 100 every time it is inputted into the hash function described in Figure 1 above. This property is necessary for a hash function. (22) Second, we would like for a hash function to be well distributed, (23) That is, we'd like for it to return different outputs for different inputs as often as possible. When two inputs produce the same output, the hash function has generated a "collision." (24) Producing few collisions is not a necessary property of a hash function, but it helps distinguish good hash functions from bad ones.

      We see, then, that our simple function is a valid hash function, just not a very good one: It behaves with perfect consistency but produces many collisions. Indeed, any other word whose letters sum to 100 will generate the same hash value, including "gauntlet" and "perturb" and many more besides.

      Why do we care how often our hash function produces collisions? Because it affects the performance of our third concept: the hash set. Recall that a hash set is a collection of inputs stored according to their hash values. Another example will help clarify.

      Suppose I write some computer program that...

To continue reading