The language of Supreme Court briefs: a large-scale quantitative investigation.

AuthorColeman, Brady
  1. BACKGROUND AND INTRODUCTION

    In 2003, we initiated a long-term project to investigate empirically the language used in United States Supreme Court briefs. (1) The exploratory stage was open-ended, largely without any particular results initially sought or predicted. We wanted to collect and categorize as much linguistic data from Supreme Court briefs as possible, and analyze such data as thoroughly as we could--and let the results lead to possible topics for publication, rather than vice versa. Indeed, at times we hoped (admittedly with quite a bit of skepticism) that we might find statistically significant differences between the linguistic styles of winning and losing briefs, and be able to offer profitable advice for practitioners based on such information. But even without any unrealistic Holy Grail outcomes, we nonetheless were confident such a study would be able to provide useful advice to legal practitioners and educators, as well as possibly interesting outcomes for scholars of legal advocacy or linguistics. Our first publication, in the American Journal of Trial Advocacy, (2) was based on a less complete database, and was narrower in scope, because it focused on the language of only one short component of the brief, the question presented. Still, this earlier article did find interesting relationships between linguistic and other variables (time, party, and the like) in Supreme Court briefs, and concluded with advice for Court advocates. (3)

    The scope of the current article is more extensive. Our database consists of nearly every brief on the merits presented to the Court for the thirty-five years between 1969 and 2004. (4) We initially downloaded about 9,000 briefs, and then chopped them up for analysis into about a quarter of a million separate brief components such as Table of Contents, Table of Authorities, Summary of Argument, and the like. To clean up and analyze the briefs, eight original PERL software programs were written for this project. We decided to download every brief, rather than a smaller number based on an appropriate statistical sampling, for two reasons. For one, downloading every brief allowed us to sidestep any sampling concerns in the first place. But more importantly, although our database is comprehensive for our purposes, we were curious about how style might vary depending on a large number of legal issues, and of course even with a full set of briefs over thirty-five years, some legal issues appear rarely (or not at all).

    1. Other Empirical Studies of the Language of Legal Advocacy

      Our project is certainly not the first to use quantitative methods to investigate the language of written (or oral) legal advocacy. The first published work that applied computational linguistics to analyze the language of judicial briefs focused on the University of Michigan affirmative action litigation, as decided by the Supreme Court in Gratz v. Bollinger (5)and Grutter v. Bollinger. (6) Over a hundred amicus briefs had been filed in these companion cases, so the authors had a healthy corpus of advocacy language for analysis. Using programs that counted the appearance of key words in each brief, they were able to show that quantitative methods alone could successfully predict the policy positions that were being advocated; statistically significant differences were found in the language of amicus briefs supporting the respondent, as opposed to amicus briefs supporting the petitioner. (7) In other work, scholars have usefully polled large numbers of active judges to ascertain what stylistic factors in appellate briefs are most favored and disfavored by decisionmakers. (8) Empirical work has also found positive relationships between success and attorney qualifications in oral arguments before the Supreme Court of Canada. (9)

      Only recently has the United States Supreme Court allowed oral argument transcripts to be released that identify a Justice by name in recording questions they pose to advocates. Previously, they had been identified generically in the transcripts (as "Justice"). With such information, we are attempting (in a separate study) to determine if the language used in oral argument questions can be quantified to predict later judicial votes. This study was initiated based on the relatively uncontroversial assumption that judicial attitudes towards litigant positions (as revealed through linguistic variables) are often well established before oral arguments are held, and so the use of language by each Justice during oral argument questioning should reveal psychological biases, (10) But even before such transcripts were available, Supreme Court scholars had been using more labor intensive methods to empirically compare the language used in oral arguments to other variables, (11) In sum, the current project might be viewed as a natural follow up to existing work: Large scale empirical studies of the language of legal advocacy have just become achievable at low cost; until recently, our project would have been extremely time-consuming and/or very expensive. We were able to benefit from access to comparatively large (and importantly, freely provided to legal academics) databases on Westlaw, combined with computer programs we created to clean up, categorize, and analyze such briefs, to automate the processing of millions of bits of data. (12)

    2. Databases and Methodology

      1. Databases

        We began by downloading the briefs and tagging them with unique identification numbers so that we could precisely link sets of variables to each other. Our database included two categories of variables: those generated with our own software, and those imported from The Supreme Court Database, better known as ALLCOURT. (13) The ALLCOURT database, funded by the National Science Foundation, contains final vote data from the beginning of the Warren Court (in 1953) and is updated periodically to include the current Court's last complete term.

        The variables coded by the database are so specific that they allow Supreme Court scholars to study a great array of empirical issues (the relative unanimity of Court decisions under different Chief Justices; the positive or negative votes of a particular Justice when the petitioner is a consumer, a creditor, or a criminal defendant; the relative success of respondents in civil rights cases; and so on). Published work that has made use of the database (predicting, for example, outcomes of decisions based on earlier voting patterns of Justices) generally supports a view of the Supreme Court decisionmaking process as ideological rather than legal. (14) In any case, we recognized that the database could also be very useful for a linguistic--rather than only a political science or legal--investigation, because it would allow us to compare variations in language (including readability) to the variables that the ALLCOURT database had teased out of decades of Supreme Court opinions (with significant effort and expenditure). Since the freely downloadable ALLCOURT database had already coded Supreme Court opinions as involving certain legal issues, with certain outcomes, and certain histories, we could link our database to it and then ask questions about the relationship of readability to these procedural, substantive, outcome, and other relevant variables. Thus, to offer one illustration, because the ALLCOURT database includes a code for the vote ultimately taken in each case, linking to ALLCOURT allowed us to automatically generate data comparing the readability of briefs to the vote count of the decision. Figure I illustrates graphically the links between different sets of variables for analysis in our MS-EXCEL relational database management system.

        [FIGURE 1 OMITTED]

      2. Methodology

        After downloading the briefs, we faced two chief methodological challenges to prepare them for our own (not the ALLCOURT) database. First, we realized that any investigation into linguistic style that was going to automatically work with the millions of words in our database (to derive average sentence length, number of letters per word, and so on) would not be accurate if it attempted to include the many citations to authority that appear in Supreme Court briefs. How would our automated program count abbreviations for statutes, cases, and the like? As complete words? As something else? And how would the punctuation in string citations, for example, be interpreted for purposes of determining where sentences began and ended? Rather than struggle with this problem, particularly because it seemed irrelevant to the question of readability itself, we decided upon an automated process to eliminate the citations. But in addition to eliminating the citations, we wanted to keep a place marker for where the citations had been, so we ultimately developed a program to replace every citation with the term "scite." Because "scite" has five letters (which is also consistently the average length of words in briefs), including it does not significantly influence our results.

        The second major challenge was to develop a program to automatically separate the briefs into different components for analysis. We hypothesized that even if certain linguistic quirks were not apparent in the Argument sections of briefs, they might be revealed in more idiosyncratic patterns found in the Statement of Facts component, to provide one illustration. Or maybe information about the average length of the Summary of Argument section (which is not specified in the Supreme Court's rules) would prove useful to advocates writing such sections, to offer another illustration.

    3. Limitations

      Many, if not most, modern Supreme Court briefs are the product of more than one authorial style. Several attorneys in a firm or government organization typically work on a brief to be submitted to the high court, and each attorney might draft different parts of the document, or at least make editorial changes to a brief that was initially drafted...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT