Google and ye shall be found: privacy, search queries, and the recognition of a qualified privilege.

AuthorWerner, Matthew
  1. INTRODUCTION

    Modern technology allows us to record and maintain massive amounts of personal information, increasing the amount of data flowing from the private sector to the Government. (1) The popularity and success of the Internet is due in large part to search engines, which facilitate navigation through vast amounts of information. (2) Search engine use is becoming the primary activity on the Internet, a close second to email. (3) Most, however, do not realize that search engines accumulate massive amounts of data including personal information such as names, addresses, and social security numbers. (4) Privacy advocates and search industry watchdogs have warned about these caches of information and the threat they pose, expressly considering the vulnerability to "thieves, rogue employees, mishaps or government subpoenas." (5)

    Google's stated mission is to "organize the world's information and make it universally accessible and useful." (6) The lofty goal of this internet search engine giant and its rivals have recently put them at the forefront of the war over a federal law designed to shield children from online pornography: the Child Online Protection Act (COPA). (7) The Department of Justice has served subpoenas on Yahoo, MSN, America Online, and Google, hoping to show that online filters meant to shield children from harmful content are inadequate. (8) This prompted the submission of millions of users' search records by these services, except Google. (9) Google's resistance to the government's subpoena attracted the attention of privacy advocates. (10)

    This Note highlights the battle over online privacy and advocates for a specific form of protection: the recognition of a federal common law qualified discovery privilege for search queries. It will begin by discussing the rise of the Internet to a virtual library of all human achievement, the role of search engines in discovery, and the invasion of privacy that occurs when these queries are revealed. Next, it will discuss Google and its fight against the Department Of Justice, including the privacy implications, followed by an analysis of Fourth Amendment and its inability to adapt to growing technologies. Finally, it will discuss the creation of the proposed privilege and possible arguments by proponents and opposition alike. This Note will conclude with a discussion of the scope and architecture of the privilege, examining the way it promotes both the goals of discovery and concerns over privacy.

    1. The Problem of Modern Informational Glut

      The benefits of living in the Information Age do not come without a price. Participation requires that we form relationships with a vast number of entities including Internet Service Providers (ISPs), cellular phone companies, credit card companies, and banks with online services. Furthermore, the future of the Internet as a virtual Royal Library of Alexandria requires effective filtering tools to unearth relevant information. (11)

      One fear is that technology lowers the cost of thinking, while increasing the quantity of thinking, which ultimately can drown us in data. For example, in 1850 only four percent of American workers handled information on the job. (12) Now, the phenomenal growth of information available online exceeds our ability to process it and frustrates the Internet's role as a virtual library. (13) Fortunately, along with the advent of the Internet, many were eager to develop new ways to navigate this new information repository. Like the printing press, search engines opened access to information once only available to the elite.

    2. The Detective, the Librarian, and the Attendant

      The development of search engines was necessary to avoid information overload. The first search engine, Archie, was developed in 1990 at McGill University in Montreal. (14) At the time, files were scattered on public anonymous File Transfer Protocol (FTP) servers, and the location of the files would remain secret unless disclosed. (15) Archie provided a searchable database of filenames by downloading the directory listings of all the files located on FTP sites. (16) As access to the Internet left the domain of academia and research organizations, (17) it was necessary to develop a search interface that would empower novices and experts alike. Between 1995 and 2000, several search engines, using different tools and varying degrees of innovation, emerged and were either acquired, integrated, or rendered obsolete. (18)

      If the Internet can be analogized to an information superhighway, then a search engine might be compared to an aging detective constantly following leads and updating his investigation, an aloof librarian meticulously following an electronic Dewy Decimal System endlessly shelving tomes, or a friendly gas station attendant providing detailed, albeit perhaps faulty, directions. These metaphors may seem rudimentary, but these are the exact functions a search engine performs when it web crawls (the detective), indexes (the librarian), and searches (the gas station attendant).

      Web crawling is achieved by a spider, crawler, robot, or bot. (19) In general, a spider starts with "seeds," a list of Uniform Resource Locators (URLs) to visit. (20) A spider visits URLs and identifies every link that it sees, adding each to the crawl frontier. It will repeat this process every month or two to ensure accuracy and timeliness. For example, the Google spider looks at words within a webpage and where the words are found. (21) During a subsequent user search, words in the title, subtitle, Meta tags, and other positions of importance are noted for special consideration. (22) Meta tags allow the owner of a page to specify the search engine spider keywords for indexing.

      Once the spider completes its task, the search engine begins indexing. (23) Indexes are analogous to a giant book--each page in the book is a copy of a web page found by the spider. With the exception of Meta tags, the indexed content will reflect the site's content. (24) The goal of indexing is to allow the public to find information as quickly as possible. The contents of the web crawler's repository are analyzed to determine how they should be indexed and to derive more information from them. (25) A highly efficient data structure is then generated. (26) Numerous strategies are employed in building these data structures, including term frequency/inverse document frequency, (27) creation of hash tables separate from the index itself, (28) and link analysis. (29)

      A search takes place when a user enters a query into the search engine. A query can be a single word or many words. Complex queries use Boolean operators (30) to refine the search and quickly arrive at more relevant returns. Search engines will then sift through billions of indexed documents in less than a second. The search starts at the root index and at every step, terms and related web pages are either pursued or removed from consideration. (31)

    3. Cookies and the Dangers of Disclosure

      Most people surf the Internet with little concern over who will be able to obtain information about domain names visited, (32) emails sent and received, (33) and search queries entered. (34) The World Wide Web has grown into a grand marketplace, not only for merchandise, (35) but also for ideas. (36) As a consequence of this exploitation, the Internet has grown into a vast filing cabinet for personal information on its users. (37) Information about a user can be gathered through voluntary disclosure such as filling out website registration forms, however, many users are unaware that web sites are able to collect information through cookie files. (38) Cookies are files used to track the travels around a website. (39) The cookie file allows a website to assign each user a unique identifier so the user may be identified in subsequent visits. (40) Website owners can track which sites users have visited and identify their interests, allowing the owners to better target advertisers' efforts. (41)

      Search engines have the highest incentive to collect user data. In today's networked environment, they act as gatekeepers to information and have almost eclipsed email as the primary online activity. (42) Combine this abundance of use with the advertiser-driven business model employed by search engines to generate revenue, (43) and the result is an overwhelming economic incentive to store increasing amounts of data. (44) The final factor in this incentive equation is the innovations in digital data storage, which have dramatically reduced the cost to search engine operators in storing personal user data. (45) The sum is a centralized identifiable database of millions of users' activities online, from websites they visited, to the people they have contacted. Very few safeguards exist to prevent government entities from obtaining this information. (46)

      The magnitude and depth of information gathered is evidenced by a recent faux pas by America Online (AOL). (47) AOL, as a gesture to researchers, released a list of web search inquiries of 658,000 subscribers entered over a three-month period onto a website intended for academic research. (48) The records were supposedly "sanitized" by substituting numeric IDs for the subscriber's real names; (49) however AOL acknowledged that the search queries might contain personally identifiable data. The threat of search query data misuse is often downplayed by users who do not believe anyone could or would reconstruct their search history. (50) This recent breach of privacy, a "screw up" as AOL representatives called it, proves just how wrong this common belief might be and just how much identifiable information is in our search histories. (51)

      Many users may believe that reading seemingly indecipherable search logs would be a tedious process, however the searches are surprisingly personal and reveal lifestyles, quirks, habits, and idiosyncrasies of the people they track. Lauren...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT