AuthorParks, Andrew M.

Rising enthusiasm for consumer data protection in the United States has resulted in several states advancing legislation to protect the privacy of their residents' personal information. But even the newly enacted California Privacy Rights Act (CPRA)-the most comprehensive data privacy law in the country-leaves a wide-open gap for internet data scrapers to extract, share, and monetize consumers' personal information while circumventing regulation. Allowing scrapers to evade privacy regulations comes with potentially disastrous consequences for individuals and society at large.

This Note argues that even publicly available personal information should be protected from bulk collection and misappropriation by data scrapers. California should reform its privacy legislation to align with the European Union's General Data Privacy Regulation (GDPR), which requires data scrapers to provide notice to data subjects upon the collection of their personal information regardless of its public availability. This reform could lay the groundwork for future legislation at the federal level.

TABLE OF CONTENTS INTRODUCTION I. DATA SCRAPING AND ITS CURRENT LEGALITY A. Scraping: Definition, Usage, and Purposes B. The Current Legal Landscape of Data Scraping 1. Claims Under the Computer Fraud and Abuse Act 2. Copyright Infringement, Trespass to Chattels, and Breach of Contract Claims II. THE DATA SCRAPING LOOPHOLE A. Publicly Available Personal Information Should Be Protected 1. Information Made Public Without the Subject's Knowledge or Consent 2. Information Made Public Voluntarily Should Still Be Protected 3. The Dangers of Allowing the Scraping of Personal Information in Bulk B. Scraping Personal Information Circumvents Current and Proposed Privacy Laws C. An Alternative Framework: The European Union's General Data Protection Regulation III. A PROPOSAL FOR CALIFORNIA: "FAIR COLLECTION" A. California Should Adopt GDPR-Style Regulations to Shield Publicly Available Personal Information from Data Scrapers B. Addressing First Amendment Concerns CONCLUSION INTRODUCTION

In January 2021, a software engineer in New York City scoured dozens of city and state websites attempting to schedule a CO VID-19 vaccination for his mother. (1) At that time, there was no uniform system for scheduling vaccination appointments. The city and state appointment systems were completely different, each with its own sign-up protocol. (2) Frustrated with this convoluted system, the engineer decided to develop a solution. In less than two weeks, he launched TurboVax, "a free website that compiles availability from the three main city and state New York vaccine systems and sends the information in real time to Twitter." (3) Because vaccine appointment information was publicly available on the internet, TurboVax could access this information using a computer program called a "bot." This bot automatically checked, copied, and republished appointment data in bulk, avoiding the need to manually check government websites for available slots. (4) The process that Turbo Vax used to extract vast amounts of data from the internet is called "scraping." (5)

It's one thing to scrape the internet for publicly available information when the content extracted is not associated with an individual's personal information, but quite another when it is. When a Linkedln user creates a public profile to search for employment, she may well include her phone number, email address, and a photo of her face. Although this information is technically "public," she might reasonably expect this information to remain personal to her and within her control. She may, for instance, list her Linkedln profile publicly while searching for a job but later set it to "private" after securing employment. Yet all her personal data--her name, phone number, email address, and photo--were, at least for some time, made public and therefore susceptible to extraction and reappropriation by scrapers. (6) And this bell cannot be unrung. (7)

Over the past few years, consumer data privacy legislation has surfaced across the United States. The California Consumer Privacy Act (CCPA) (8) and the California Privacy Rights Act (CPRA), (9) for instance, now regulate the collection of consumer personal data and the sharing of such data with third parties. But no currently proposed or enacted privacy statute adequately protects publicly available personal information. (10) All of it is exempted, making it fair game to be scraped, used, shared, or sold. Many scholars have written about data scraping and its legality under the Computer Fraud and Abuse Act. (11)

Others have discussed various consumer data privacy statutes and proposals across the United States and Europe. (12) But few have addressed the privacy implications of scraping publicly available personal information, (13) and no one has proposed a reform to regulate such activity in the United States. This Note does just that.

Part I of this Note defines data scraping, explains its purposes, and summarizes its current legality. Part II argues that publicly available personal information should be protected from data scrapers, analyzes the current landscape of state and federal consumer data privacy legislation, and explains why existing and proposed solutions are inadequate to address this issue. It also describes how publicly available personal information is handled by the European Union's General Data Protection Regulation (GDPR). Part III argues that while passing legislation at the federal level could be desirable, California ought to amend its privacy laws to incorporate GDPR-style protections for publicly available personal information. Specifically, California should regulate the collection of publicly available personal information based on whether the information collected can be anonymized, whether the information is collected in bulk, and whether the information is collected for commercial purposes.


    To understand the privacy implications of data scraping, it is necessary to explain its function and legality. Scraping has many useful applications, and it is often employed by individuals serving the public interest. Unfortunately, scraping can also be used for malicious purposes, and businesses frequently attempt to block or deter parties from scraping their websites. As such, Part I concludes by examining the most common legal claims available to address scraping.

    1. Scraping: Definition, Usage, and Purposes

      Data scraping is the process of scanning and extracting large amounts of data from one or more websites using a software program often referred to as a "bot," "robot," or "scraper." (14) Scraping is different from "hacking," which involves breaking into another person's "computer, network, servers, or database," (15) typically by cracking a password or exploiting a vulnerability in the website's code. (16) Scrapers, by contrast, extract publicly available data (17) and thus have no need to break into private servers.

      Scraping has many beneficial purposes. It can be used to preserve websites, conduct research, compare product and price information from various sources, gather contact and social media data for outreach campaigns, track company reputation, and aggregate news and other content on curated websites. (18) Journalists use scraping technology to gather and analyze massive chunks of statistical data. (19) Scholars employ scraping technology to aid their academic research. (20) Advertisers use scraping technology to collect contact details and public posts on social media websites to better market their products to consumers. (21)

      Although scraping has beneficial applications, scraping technology can also be used for malicious purposes, such as spamming email accounts, causing website crashes, (22) or conducting scams. (23) Exemplifying morally questionable use of data scraping technology is the company Clearview AI. (24) Clearview scrapes billions of personal images posted on Facebook and other websites for use in its facial recognition software. (25) It then sells its software to law enforcement agencies, allowing police departments to "compare a face captured on a security camera against [Clearview's] database to reveal possible matches." (26) No user consents to Clearview's collection, and even if the image is later removed from the public site, Clearview keeps a copy. (27) Significantly, cease-anddesist letters from Google, YouTube, Venmo, and Linkedln have failed to stop Clearview from scraping. (28) Clearview has ignored the letters and maintains that it has a First Amendment right to access publicly available information. (29) Clearview's facial recognition software has been used by thousands of law enforcement agencies, companies, and individuals around the world. (30)

      Scraping technology is also deployed problematically in the "mugshot industry." (31) In this industry, private companies use bots to scrape booking photos of arrested persons from publicly accessible law enforcement websites. The companies then display the photos in "mugshot galleries" on their websites. (32) Scraping enables the companies to monetize the mugshots in various ways, such as hosting advertisements on their websites, charging visitors a fee to search their mugshot database, and--most controversially--charging subjects large fees to have their mugshots removed. (33) Even if an arrested person's criminal record is expunged, their scraped mugshot can appear in Google search results and be dispersed across dozens of websites. (34)

      To prevent scraping, website owners often prohibit the practice in their website's terms of service (35) or implement technological barriers. One such barrier is the installation of a "robots.txt" file--a widely used protocol that instructs specified bots to ignore certain files when crawling or scraping a website--to their website's root directory. (36) However, these...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT