BAD BOTS: REGULATING THE SCRAPING OF PUBLIC PERSONAL INFORMATION.

AuthorXiao, Geoffrey

TABLE OF CONTENTS I. INTRODUCTION 702 II. THE PROBLEM OF PRIVACY IN PUBLIC 703 A. U.S. Privacy Law and the Rule of "No Privacy in Public" 703 B. There are Privacy Interests in Public Personal Information 706 1. The Privacy Harms of Public Personal Information 706 2. Privacy Because of Obscurity 708 3. Privacy Because of Trust in Websites and Other Users 709 4. Posting Publicly is Not Implied Consent 710 III. THE EXISTING REGULATORY LANDSCAPE 712 A. Website Enforcement of Data Scraping Causes of Action 712 B. Agency Enforcement of Data Broker Laws 714 C. User and Agency Enforcement of Data Privacy Laws 715 1. The Right to Receive Notice Prior to Data Collection: BIPA and CCPA 716 2. The Right to Receive Notice Prior to Data Collection: GDPR Article 14 717 IV. SHORTCOMINGS OF THE EXISTING REGULATORY LANDSCAPE 718 A. Pre-Collection Notice is Necessary to Protect Privacy from Scrapers 719 B. The Role of Consent: Opt-In or Opt-Out? 721 C. A More Nuanced Role for Websites 724 V. THE FIRST AMENDMENT OBJECTION TO REGULATING PUBLIC PERSONAL INFORMATION 727 A. Is Scraping First Amendment Protected "Speech"? 728 B. Does Pre-Collection Notice Unduly Limit Speech (Scraping)? 729 C. Does Opt-In Consent Unduly Burden Speech (Scraping)? 730 VI. CONCLUSION 731 I. INTRODUCTION

In early 2020, the New York Times broke the story of Clearview AI, which had created a facial recognition program by surreptitiously scraping more than three billion images from publicly available websites such as Facebook and Twitter. (1) The New York Times story provoked outcry from privacy advocates, the websites that were scraped, and the public. (2) However, Clearview's refrain has been that the images it scraped were made publicly available by users and that these users have no privacy interests in otherwise public information. (3)

This Note analyzes how U.S. law addresses the privacy implications of data scraping. This analysis looks at public personal information, which is information openly accessible on the web that can identify the poster. The prototypical example is the one raised by Clear-view's scraping: users publicly post personal information (e.g., photos) on social media websites like LinkedIn and Facebook, and a third party (e.g., Clearview) scrapes this public personal information.

The principal problem raised by data scraping is whether users have privacy interests in personal information they have posted publicly. While U.S. law has generally been reluctant to find such privacy interests, this Note argues that there are strong privacy interests because harms arise from the unauthorized use of public personal information and because users post information in an environment of obscurity and trust. (4) Next, this Note surveys how existing legal regimes protect public personal information. This Note argues that current regulations fall short in several significant ways. First, some data privacy laws--notably, California's comprehensive data privacy statute, the California Consumer Privacy Act ("CCPA")--exempt scrapers from providing notice to users whose data have been scraped. These laws are based on the presumption that the indirect scraper/user relationship makes providing notice difficult. Second, the legal regime needs to require opt-in consent instead of opt-out consent because opt-out consent fails to adequately protect privacy. Third, regulations need to provide an active role for websites in protecting their users' privacy, but regulations also need to be wary of granting websites monopolistic control over scraping.

Lastly, this Note addresses the First Amendment defense that scrapers like Clearview have made. It is arguable whether regulations that limit the scraping of public information are a constraint on speech (and subject to strict scrutiny) or merely a restriction on expressive conduct (and subject to intermediate scrutiny). In any event, a notice-and-consent requirement withstands First Amendment challenges because it does not unduly limit scraping.

  1. THE PROBLEM OF PRIVACY IN PUBLIC

    The central problem raised by scraping is whether users have a legitimate privacy interest in information they have made public. Clear-view, for instance, has argued "that individuals have no right to privacy in materials they post [publicly] on the Internet." (5) While U.S. law generally follows the rule of "no privacy in public," there are actually very strong privacy interests in public personal information.

    1. U.S. Privacy Law and the Rule of "No Privacy in Public"

      U.S. privacy law is described as "sectoral," meaning privacy regulation is field-specific as opposed to "omnibus." (6) For example, privacy torts allow individuals to vindicate invasions of their privacy, and the Fourth Amendment protects individuals from government intrusion. (7) In these different sectors, U.S. law has generally adopted the "no privacy in public" principle. (8)

      In the public disclosure of private fact tort, "there is no liability when the defendant merely gives further publicity to information about the plaintiff which is already public or when the further publicity relates to matters which the plaintiff leaves open to the public eye." (9) For example, in Daly v. Viacom, Inc., the defendant photographed the plaintiff kissing a man in a bathroom stall. (10) Even though the defendant took this photograph in a private bathroom stall, the court rejected the plaintiff's claim for public disclosure of private fact. (11) According to the court, simply because the plaintiff had kissed the man in public before, her kiss had been publicly disclosed, so the disclosure was not actionable under tort law. (12)

      Similarly, under the intrusion upon seclusion tort, claims resting on "'public places' or things that are in 'plain view'" are not actionable. (13) In one case, a news crew filming a car accident was not liable under the intrusion tort because the accident occurred on a public highway. (14) The court distinguished between filming on a public highway (not actionable) and filming inside a medivac helicopter (actionable because "plaintiffs had an objectively reasonable expectation of privacy in the interior of the rescue helicopter, which served as an ambulance"). (15)

      While not directly applicable to Clearview, the Fourth Amendment's treatment of the privacy in public problem provides a helpful analogue. (16) Just like tort law, the Fourth Amendment gives minimal protection for publicly available information. Under what Professor Monu Bedi calls the "public disclosure doctrine," courts have found that individuals lack reasonable expectations of privacy in publicly available information. (17) The seminal Katz v. United States case announced that "[w]hat a person knowingly exposes to the public, even in his own home or office, is not a subject of Fourth Amendment protection." (18) In California v. Ciraolo, the Court found the Fourth Amendment inapplicable when the government performed aerial surveillance on a backyard because "[a]ny member of the public flying in this airspace who glanced down could have seen everything that these officers observed." (19) The Court also refused to extend Fourth Amendment protections to a police search of sidewalk garbage because "[i]t is common knowledge that plastic garbage bags left on or at the side of a public street are readily accessible to animals, children, scavengers, snoops, and other members of the public." (20) In rejecting a Fourth Amendment argument against a subpoena seeking public (but since deleted) tweets, one court aptly summarized: "[i]f you post a tweet, just like if you scream it out the window, there is no reasonable expectation of privacy." (21)

      Still, there has been some pushback against the rule of no privacy in public. In Commonwealth v. McCarthy, the Massachusetts Supreme Judicial Court acknowledged that license plates are "knowingly exposed" to the public and that police are free to examine the license plates of cars driving on the street. (22) However, the court also said that a pervasive (in terms of location and time) automatic license plate reader ("ALPR") system would raise Fourth Amendment issues "because the whole of one's movements, even if they are all individually public, are not knowingly exposed in the aggregate." (23) Nevertheless, McCarthy found an ALPR system of "four cameras at fixed locations on the ends of two bridges" insufficiently pervasive to violate the

      Fourth Amendment. (24)

      Like the Fourth Amendment, the First Amendment has given short shrift to privacy interests in public information. In Cox Broadcasting Corp. v. Cohn, a rape victim sued a television station for broadcasting her name. (25) The Court found that the First Amendment defeated the plaintiff's claim because the station had obtained the plaintiff's name from public court records. The Court explained that "the interests in privacy fade when the information involved already appears on the public record." (26) Florida Star v. B.J.F. reached a similar conclusion when a newspaper published a rape victim's name after obtaining it from a publicly available police report. (27)

      Recently, the Ninth Circuit applied the "no privacy in public" rule to data scraping. In hiQ Labs, Inc. v. LinkedIn Corp., LinkedIn attempted to use the Computer Fraud and Abuse Act ("CFAA") to protect its users' privacy interests from a scraper. (28) hiQ scraped public profiles from LinkedIn's website to produce a data analytic called "Keeper," which "identif[ies] employees at the greatest risk of being recruited away." (29) LinkedIn argued that hiQ's data scraping and "Keeper" analytic endangered LinkedIn users' privacy. Even though hiQ only scraped public profiles, LinkedIn argued that "many members--including members who choose to share their information publicly--do not want their employers to know they may be searching for a new job." (30) Ultimately, the Ninth Circuit found that LinkedIn users' had minimal privacy interests in public...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT