The public comment process is one of the hallmarks of the American administrative state. (1) As the informal notice-and-comment rulemaking procedure has grown into one of the most important national policy-making venues, the public comments process has become a forum for both organized interest groups and ordinary individuals to engage in public deliberation and political debate. (2) In recent years, as both the ease of participation and interest in rulemaking have grown, there has been an explosion of public participation, and agencies now receive millions of comments from the public each year concerning proposed agency actions. (3) These comments are voluntarily generated by individuals and organizations representing a vast diversity of interests--from large industrial trade associations representing businesses with billions of dollars at stake to individual citizens who have an interest in a particular regulatory outcome.
There is a substantial academic literature that studies, critiques, defends, and proposes reforms to the notice-and-comment process and informal rulemaking. (4) The advent of electronic rulemaking (commonly referred to as "e-rulemaking"), which involves the use of digital technologies by agencies to distribute information and collect comments, initiated a wave of scholarly commentary on this new phenomenon. (5) Opinions on the potential value of e-rulemaking vary from reasonably optimistic to highly skeptical, with some scholars remaining hopeful that new technologies can facilitate more inclusive and participatory decisionmaking, while others point to limited successes so far and question whether any increase in participation that has occurred has added anything of value. (6)
In this Article, we argue that at least some of the disappointment and pessimism surrounding e-rulemaking is premature and reflects inadequate attention to the potential to deploy digital technologies not only to solicit comments, but to analyze and understand them as well. Recent advances in machine learning and natural language processing have made powerful text analysis tools more broadly available. Both commercial enterprises and academic researchers have recently begun to put these tools to use in a variety of settings, from tracking employee morale based on email communications to testing the relationship between online blogging and political opinions. (7) Computational text analysis of public comments, however, is relatively rare, leaving largely untapped a substantial resource for both scholars and policymakers. (8) This Article explores how these new tools can be used by researchers, agencies, and oversight institutions to understand and improve agency decisionmaking.
From a positive, descriptive perspective, public comments are a valuable source of data that can be used to empirically examine how bureaucratic institutions interact with the public. As a form of political participation that is unique to the bureaucratic setting, commenting behavior is an interesting and important phenomenon in its own right and provides information on how agencies and their actions shape, and are shaped, by the publicly expressed views of individuals and groups. In recent years, a small number of political scientists and others interested in bureaucratic behavior have begun to take advantage of public comments to study agencies. (9)
In this Article, we contribute to this nascent field by conducting the first large-scale sentiment analysis of public comments to examine how word choices in nearly three million public comments are related to measures of agency ideology. Sentiment analysis has become a widespread tool used in a variety of settings to estimate how texts reflect the attitudes of their authors. Applying a basic, replicable procedure of sentiment analysis to public comments received for all nonminor rulemakings over the course of the Obama administration, we find that agencies with more moderate ideological leanings tend to receive comments that contain more positive language. This analysis indicates, as a threshold matter, that political characteristics of agencies are correlated with comment characteristics. Future work can build on this insight to inform subsequent research into the relationship between agencies' behavior and the public comments they receive.
Moving from the descriptive to the normative, we examine how agencies and agency oversight institutions can use computational text analysis of public comments to improve agency decisionmaking and accountability. In the era of mass commenting, agencies face both a "needle-in-the-haystack" problem (i.e., identifying the most substantive comments) and a "forest-for-the-trees" problem (i.e., extracting overall trends or themes in large, unstructured collections of documents). To examine the usefulness of text analysis techniques to address these challenges, we carry out a case study of the comments received by the Environmental Protection Agency (EPA) in response to its proposed rule to limit greenhouse gas emissions from the electricity-generating sector, the Clean Power Plan. We find that, although not perfect, existing techniques already have value for agencies and can be further refined to improve their current performance.
The Clean Power Plan case study illustrates how text analysis can help agencies, but there is an open question of whether agencies will put these tools to use. Doctrinally, existing interpretation of agencies' duties under the Administrative Procedure Act (APA) will likely provide sufficient incentive for agencies to take reasonable steps to address the haystack problem--agencies may soon deploy more sophisticated text analysis tools for this purpose. There is currently very little incentive, however, for agencies to address the forest problem. We will argue that this is a challenge that is worth taking up, and will discuss steps that could be taken by oversight institutions, including courts and the White House, to encourage agencies to do so.
The remainder of the Article will proceed as follows. Part I examines the practice of public commenting and how it has changed in recent years. We first address the various normative goals served by the public comment process, which include gathering technocratic and political information and serving expressive and procedural justice functions. We then turn to difficulties created by the recent participation explosion. For public comments to serve their social function, someone needs to read them. Reading a few comments presents no particular technical difficulty, but for many important rulemakings, agencies are presented with tens of thousands of comments--and sometimes upward of a million. These situations present a very severe resource and cognitive dilemma. These challenges are in part a problem created by technology, but just as technology has created new challenges, technological innovation can offer new solutions. When agencies receive a flood of comments, they are in essence facing a "big data" problem. (10) The tools of data science--and in particular sophisticated techniques that have been developed in recent years to analyze large textual datasets--are meant to respond to exactly these big data challenges. Part I concludes with an introduction to the computational text analysis tools that can help translate the dilemmas posed by the era of mass public commenting into an opportunity to enhance understanding of, and participation in, the rulemaking process.
Part II focuses on the potential contribution of computational text analysis to better understand agencies and their relationships to the public. We first provide a brief overview of the substantial literature in law and political science that examines bureaucratic politics and agency decisionmaking. We then introduce the natural language processing technique referred to as sentiment analysis, which is a tool used to extract the attitudes of authors from the word choices within a text. A variety of new sentiment analysis techniques developed in recent years have found uses in both commercial settings and academic research. We apply a relatively straightforward form of sentiment analysis to nearly three million comments received by federal agencies during the Obama administration to test the relationship between agency ideology and comment sentiment, finding a significant relationship: agencies that occupy the center are more likely to receive comments with relatively more positive sentiment; agencies closer to either ideological pole tend to receive comments with more negative sentiment. This analysis is a useful contribution to the existing literature on bureaucratic policies and illustrates the potential of computational text analysis of public comments to inform scholarship in this field.
Part III discusses the value of computational text analysis in improving public participation in the regulatory process. We begin by characterizing two challenges that agencies face when responding to a very large set of public comments: the haystack problem and the forest problem. The haystack problem occurs when comments of high substantive value are hidden within a very large set of documents of lower substantive value, creating the risk that agencies will fail to locate and appropriately consider high-value comments. The forest problem occurs when agencies are unable to identify aggregate patterns within a large collection of comments because they treat each comment in isolation rather than drawing connections between them. We then test whether current text analysis techniques can be of use, using EPA's controversial and high-comment-volume climate change rule as a case study. We first examine how a basic text analysis tool can separate more substantive comments from less substantive comments and discuss how agencies, given their greater internal knowledge, could improve on our basic approach. We then deploy a text...