The Legality of Web Scraping: A Proposal.

AuthorMacapinlac, Tess

TABLE OF CONTENTS I. INTRODUCTION 400 II. A BASIC UNDERSTANDING OF WEB SCRAPING AND THE CFAA 402 A. Web Scraping 402 B. The Computer Fraud and Abuse Act 403 III. WEB SCRAPING CASES 407 IV. WEB SCRAPING SHOULD NOT BE UNDER CFAA JURISDICTION 408 A. Public Information Is Not Subject to Hacking 408 B. Web Scraping Allows Competition in the Marketplace 410 C. Proportionality, Vagueness, and Alternative Legal Avenues 412 V. PROPOSAL: AMENDING THE COMPUTER FRAUD AND ABUSE ACT 414 A. Supporters 416 1. Scraping Businesses and Academics 416 2. Legislators and Policy Groups 417 3. Online and Technology Communities 418 B. Opponents 419 1. Scraped Businesses 419 2. Consumer Privacy Advocates 420 VI. CONCLUSION 42 I. INTRODUCTION

When people think of hacking, many may think of people using computers to break into government databases or city records, like in a scene from a television show like Arrow. (1) The scene often involves hurried typing, furrowed brows, instant results, and often, very few punishments for hacking. Hacking, which may feel like a modern innovation due to continual improvements in technology, has been infused into pop culture for years. Certainly, newer television shows like Mr. Robot, Scorpion, and Blacklist show hacking in a variety of lights and taking place in a variety of circumstances. (2) Even in the 1997 movie Independence Day, a satellite technician saves the world by hacking into an alien mothership. (3) Prior to that, Jurassic Park showed a juvenile hacker taking on a UNIX system to reactivate security measures in a dinosaur park gone berserk. (4)

Others may think of hacking as they see it in the news. For instance, the infamous Ashley Madison hack revealed the identities and contact information of the site's users, who frequented the Ashley Madison website with the intention of having discreet extramarital affairs. (5) The Home Depot hack is another infamous incident, where the credit card numbers of almost fifty million customers were revealed. (6) Reports show that over five thousand breaches occurred in 2017, compromising almost eight billion records. (7)

This image of hackers sitting in a dark room, bent over computers, furiously typing complicated computer code is the image that many people tend to associate with the term hacking. (8) Personal information revealed, secrets unleashed, and access to information a person was never supposed to have are all ideas equally associated with hacking. (9) With so much personal data given over to companies and held in electronic formats, (10) people are right to be concerned with hackers and the damage they can do.

However, a lot of this hacking rides on the idea of secrecy. Whether it is information that is given to a company with the condition of confidentiality or unknown information relating to the computer system on an alien spaceship, hacking relies on the idea that the hacker isn't supposed to know or be able to get the information that they are taking. (11) Thus, the idea of hacking publicly available information does not fit into either of these categories. Companies promise to do their best to keep consumer information safe and private. (12) However, if certain information is public, then by definition, all people should have access, and none of it should be a secret. Regardless, a practice known as web scraping is considered hacking under the Computer Fraud and Abuse Act ("CFAA"), codified at 18 U.S.C. [section] 1030 (2012). (13) Web scraping is the act of pulling data from a website's output and saving it to a file or database. (14)

This Note focuses on the scraping of publicly available information and how this particular act should not be considered illegal under the CFAA. First, this Note will more thoroughly explore the technicalities and benefits of web scraping, as well as the relevant sections of the CFAA. Next, it will examine some of the prominent cases that have used the CFAA to prosecute web scraping. It will then examine why web scraping should not be punishable under the CFAA. It will go on to present a proposed amendment and the thought process that went into its language. Finally, it will consider the potential supporters and opponents of the proposed amendment. At its core, this Note is a proposal to add an amendment to the CFAA that would legalize the web scraping of publicly available websites.


    1. Web Scraping

      The method in question is known as web scraping. A web scraper is a piece of computer code that translates into an automated bot. (15) This bot then accesses web pages, finds specific data, extracts it from the web page, and saves it on a computer or similar device. (16) A person can then access the data and use it for a variety of purposes, such as in research or business. (17) Web scraping is useful for anyone who needs a large amount of information from a large number of websites; while everything this kind of bot does can be done manually, the work is done faster and more efficiently by utilizing web scraping. (18) Web scraping is not an uncommon practice, as bots account for nearly a quarter of all Internet traffic, due to businesses, researchers, and others using web scraping for different reasons. (19)

      Web scraping can be used for a variety of purposes. One of the most common examples is search engines, which use scraping to link users to pertinent webpages. (20) Since search engines play an important role in the online ecosystems for both users and companies alike, the stigma associated with search engine web scraping activities is situational and limited. (21) Academia is another field that may utilize web scraping. For instance, Geoff Boeing and Paul Waddell built their own web scraper to scrape data for their paper concerning rental housing markets. (22)

      Another common form of web scraping takes place on budgeting apps, like Mint. (23) In order to use Mint, a user uploads authorization to access their different bank accounts. (24) The app then scrapes the account information so that users can track their budgeting and spending habits from a single app. (25)

      Even journalists utilize web scraping for a variety of online investigations, to the point where both lawyers and journalists make suggestions on how to do so in an ethical and legal fashion. (26) Journalist Nael Shiab points to his own career as an example of web scraping for journalistic purposes. (27) However, many cases involving web scraping and the CFAA focus on businesses that use web scraping as a part of their business models, rather than search engines or academia. (28)

    2. The Computer Fraud and Abuse Act

      In 1984, Congress passed the Comprehensive Crime Control Act, codified in 18 U.S.C. [section]1030, in order to combat the growing threat of computer crime and hacking. (29) Over the next two years, Congress continued to investigate issues presented by computer crimes and how federal statutes could tackle such crimes. (30) In order to address such issues, Congress held hearings on potential bills focused on computer crimes, which, in 1986, culminated in Congress passing the Computer Fraud and Abuse Act, which amended various parts of 18 U.S.C. [section]1030. (31)

      At the time the CFAA was brought into effect, the government and various financial institutions were the primary entities that used computers and thus were most vulnerable to hackers. (32) As such, the CFAA was designed with classified information and credit or financial information in mind. (33) Eventually, as computers and the Internet became widely used by civilians, definitions in the CFAA were expanded to cover computers that could be involved in interstate commerce, which implicated any computer connected to the Internet. (34)

      In the context of the CFAA, hacking occurs when a person "intentionally accesses a computer without authorization or exceeds authorized access." (35) In this case, the relevant hacking occurs when a person accesses a "protected computer." (36) For these purposes, a "protected computer" is defined to include a computer "which is used in or affecting interstate or foreign commerce or communication." (37) Notably, the user does not have to use the computer for interstate or foreign commerce or communication; rather, the computer simply has to be capable of doing so. (38) Thus, any computer connected to the Internet could be considered a protected computer under the CFAA. (39)

      The CFAA defines "exceeds authorized access" to mean "to access a computer with authorization and to use such access to obtain or alter information in the computer that the accesser is not entitled to so obtain or alter." (40) The CFAA does not define the term "without authorization," but experts interpret the term as referring to a person who is an outsider of the institution, like a hacker, as opposed to an insider, who would have access in the first place. (41)

      The CFAA has long been criticized for its hefty punishments and vague definitions. (42) The CFAA's vague definitions can make a simple act like lying about your age online fall under the definition of hacking. (43) Additionally, a single act can violate different parts of the CFAA, resulting in a compounded sentence for a single act. (44) In fact, some CFAA violations carry more severe punishments than an aggravated assault charge. (45)

      The criticism surrounding the CFAA reached a high point during the criminal case against Aaron Swartz. Swartz, known for helping to launch Reddit, broke into an electrical closet at the Massachusetts Institute of Technology, wired his laptop into MIT's system, and proceeded to download academic articles from the online database JSTOR. (46) In the District Court for the District of Massachusetts, Swartz was tried for eleven violations of the CFAA, (47) as well as wire fraud, which could have led to thirty-five years in prison and a million dollar fine. (48) However, the charges were never resolved as Swartz took his life...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT