James Grimmelmann: My thanks for their comments to Jack Balkin, Yochai Benkler, Shyam Balganesh, Aislinn Black, Michael Carroll, Eric Goldman, Anne Huang, Dan Hunter, David Johnson, Thomas Lee, Beth Noveck, Frank Pasquale, Guy Pessach, Chris Riley, Steven Wu, Tal Zarsky, and the participants in the workshops where I presented earlier versions of this Article. After June 1, 2008, this Article is available for reuse under the Creative Commons Attribution 3.0 United States license, http:// creativecommons.org/licenses/by/3.0/us/. All otherwise-undated web sites in footnotes were last visited on August 28, 2007.
Search engines are the new linchpins of the Internet.1 A large and growing fraction of the Internet's immense volume of traffic flows through them. They are librarians, who bring order to the chaotic online accumulation of information. They are messengers, who bring writers and readers together. They are critics, who elevate content to prominence or consign it to obscurity. They are inventors, who devise new technologies and business models to remake the Internet. And they are spies, who are asked to carry out investigations with dispatch and discretion.2
Lawyers and the law have taken notice of search engines. Governments around the world are casting an increasingly skeptical eye on search engines, questioning whether their actions have always been in the interests of society. More and more parties are presenting themselves at the courthouse door with plausible stories of how they have been injured by search engines. Only a few foresighted legal scholars have recognized the growing importance of search engines.3 Page 4
This Article provides a roadmap to the legal issues posed by search. It indicates what questions we must consider when thinking about search engines, and it details the interconnections among those questions. It does not endorse any particular normative framework for search. Nor does it recommend who should regulate search.4 Instead, it provides an analytic foundation to distinguish informed decisionmaking from random flailing.
The diverse questions of law it discusses form a coherent set because each affects the same few information flows. The essence of a search engine is that it combines its own knowledge of available content with user queries to provide recommendations to its users; the doctrines and policy values this Article discusses relate directly to this core process. Other law affects search engines-Google's well-publicized Initial Public Offering ("IPO"), for example, raised substantial issues of securities law,5 and search engines have been sued for employment discrimination6-but these other issues can be resolved on their own merits, in isolation. In contrast, the concerns discussed in this Article must be balanced with one another because each relates to the same few information flows. Pushing on one affects the others.
Part I explains how modern search engines function and describes the business environment within which they operate. Search engine operations can be understood in terms of the information flows among four principal actors: (1) search engines themselves, (2) their users, (3) information providers, and (4) third parties with interests in particular content flows (such as copyright holders and censorious governments). There are, in turn, four significant information flows: (1) the indexing by which a search engine learns what content providers are making available, (2) user queries to the search engine for information about particular topics, (3) the results returned by the search engine to users, and finally, (4) the content that providers send to users who have found them through searching. Because so many major search engines are funded through advertising, this Part also Page 5 includes a brief survey of how search engine advertising works and the distinctive fraud problems confronting search engines and their advertisers.
Part II, the heart of the Article, presents a descriptive analysis of the legal struggles over search, showing how questions of search policy-many of which have long been latent in different fields of Internet law-are increasingly confronting lawyers, courts, and regulators. It describes those struggles in terms of the legitimate interests that each of these actors brings to debates over search. Users want high-quality results without too great a sacrifice of privacy. Content providers want favorable placement in search results without paying more than their fair share of the costs of supporting search and without facing unfair competition from search engines. Third parties want to prevent unauthorized distribution of copyrighted content, to preserve their own privacy, to protect their reputations, and to preserve what they see as "user virtue." And finally, search engines want to preserve their ability to innovate, to protect themselves from fraud, and to ensure that the search market remains open to competition. Each entry in this list of interests has its own associated legal theories; this systematic taxonomy allows us to recognize how any given legal theory affects the search ecology.
Part III then shows, with five examples, how taking a broad view of search yields otherwise-unavailable insights into pressing controversies. This is not to say that the end result must be a body of search-specific law,7 but only that failing to consider the larger forces at work in search is antithetical to sensible policymaking. First, the broad, systematic view illustrates how various claims in search engine disputes can serve as functional substitutes for each other. Second, it shows that the degree of transparency of the search process is a highly contested variable, with some concerns pressing for greater transparency and others pressing for less. Third, it illustrates that user privacy is a deeply knotty problem and that preserving reasonable user expectations will involve difficult trade-offs with other interests-including some of the users' own. Fourth, it shows that we require a theory of search engine speech; the most sophisticated theory of search-engine-results-as- speech so far articulated by a court is too simplistic. And fifth, it illustrates the richness of debates over search engines' relationships to providers' trademarks.
Finally, a brief Conclusion takes note of some of the many open issues facing search engine law and scholarship. Page 6
Every major Internet application today is a search engine, contains a search engine, or depends on a search engine.8 Because search is so increasingly indistinguishable from other applications, we shouldn't expect our definition of "search engine" to differentiate clearly between those things that are search services and those that aren't. Instead, let's start with a definition that accurately describes the core, paradigm cases: a search engine is a service that helps its users locate content on the Internet. That's what Google, Yahoo!, MSN Live, and Ask.com do: help people find stuff online. So do their smaller competitors, from AltaVista to Zoohoo.
As anyone who's played with Google's ever-expanding list of search services can attest, search engines help users find more than just web pages. Google Scholar searches journal articles; Yahoo! Local searches businesses near the user; the Internet Movie Database searches lists of film casts and crews; and Like.com searches online sales of clothes and jewelry by color, shape, and pattern. Thus, it's better to say that search engines help users find "content" than to say "pages" or "sites."9
All of these search engines help users find publicly accessible content, but others work with specialized sets of content not available to the public. Thus, LexisNexis, for a fee, allows users to search a large proprietary database of legal and news documents. Similarly, peer-to-peer file sharing systems such as Gnutella and Grokster allow users to search content that typically is accessible only through the peer-to-peer service itself.
Go far enough along these axes (away from the web and away from publicly accessible content) and you will reach things that are only marginally recognizable as search engines, according to the definition above. Google Desktop, for example, is one of several competing tools for users to search their own computers. Not every legal issue affecting search applies to these borderline cases. But some issues carry over even here: Google Desktop has raised privacy concerns resembling those that apply to plain-vanilla Google Web Search.10 The point is that to the extent a technology resembles the paradigm case of public web search-and a great many technologies do-it raises many of the issues described below. Anyone working with that technology needs to think about how those issues fit Page 7 together. The taxonomy that follows provides a framework for thinking both about search engines and about the large...