CLEANING CORPORATE GOVERNANCE.

AuthorFrankenreiter, Jens

INTRODUCTION 3 I. THE STATE OF PLAY IN EMPIRICAL CORPORATE GOVERNANCE RESEARCH 9 A. Demand 10 B. Supply 14 II. RECLAIMING CORPORATE GOVERNANCE 25 A. Charter Texts 25 B. Data Labels 29 C. Reassessing What We Know (or Thought We Knew) about Corporate Governance 34 D. Aggregating G-Index Errors 38 E. The Arbitrage Value of Good Governance Revisited 42 III. CORPORATE GOVERNANCE AS "BIG DATA" 47 A. Document-Level Trends 47 B. Latent Semantic Content 51 C. Supervised Learning Tolls 58 IV. IMPLICATIONS AND THE ROAD AHEAD 61 CONCLUSION 64 APPENDICES 66 APPENDIX A: RESEARCH ASSISTANT ACKNOWLEDGEMENTS 66 APPENDIX B: DATA COLLECTION AND CLEANING PROTOCOLS 67 INTRODUCTION

Corporate governance lapses are blamed for some of the most ignominious business catastrophes in recent history, from Enron's epic collapse, (1) to Wells Fargo's $3 billion fine, (2) to the implosions of We Work (3) and Theranos. (4) And in the wake of each debacle, legions of empirically minded researchers soon followed, (5) marshaling mountains of quantitative data to unpack lessons about where governance failed and how it can be improved. (6) Their collective efforts have met with a ravenous reception: empirical corporate governance research now dominates the law and finance landscape, (7) routinely informing government policy, (8) real-world practice, (9) and vigorous academic debate. (10) By any reasonable accounting, the topic is a major success story in the interdisciplinary study of law.

And yet a potentially fatal flaw has long lurked just beneath this seemingly resplendent facade: shallow data. Many of the preeminent contributions in empirical corporate governance depend commonly (and critically) on a surprisingly slender stockpile of datasets whose provenance is frustratingly obscure. But virtually no one has seriously attempted to gauge the integrity of these pivotal inputs. (11)

Until now. In this article, we unveil a new resource that allows researchers--for the first time--to investigate the fidelity of foundational corporate governance metrics. And the results aren't pretty. We demonstrate that several of the most heavily relied upon governance datasets suffer from inaccuracies so extensive as to call into question some of the landmark insights of the field.

The resource we unveil is anchored by a first-of-its-kind textual corpus representing over a quarter-century's worth of corporate charters for S&P 1500 listed issuers. (12) We hand label (13) a significant subset of these full-text documents for characteristics that feature prominently in the governance literature. And, rectifying a longstanding deficit in the field, we make the corpus publicly available as open source, in the hope that it will catalyze and improve future research. Collectively, we refer to our raw corpus and labels as the "Cleaning Corporate Governance" (CCG) database. The database provides researchers with an unprecedented capability to analyze the composition and structure of the very textual heart of corporate governance--certificates of incorporation--across firms, industries, and jurisdictions, and over time.

But it is substantially more than that. The CCG also allows researchers--for the first time--to reassess foundational insights from law and finance. We use it, for example, to show that the ingredients of the most renowned corporate governance index in the field, the "G-Index," (14) are riddled with inaccuracies, resulting in an estimated error rate exceeding 80%--a rate that gets worse over time. And these inaccuracies are not simply garden-variety statistical anomalies. Rather, we demonstrate that they unsettle even one of the most famous results in the field: that systematically investing in firms with "good governance" delivers returns that significantly eclipse the market. (15) When reanalyzed with corrected data, this result changes appreciably. To the extent any part of it survives, it does so in a materially attenuated form.

The value of the CCG is not limited to reassessing prior results in the corporate governance literature, however. It also helps lay the foundation for the next chapter of corporate governance research at a critical moment, when we stand at the crossroads of several new and exciting directions the field might pursue. Machine learning and computational text analysis, for example, are becoming increasingly prominent in many areas of legal scholarship (16) but have yet to gain a significant foothold in corporate governance. (17) The CCG is ideal for these methodologies, and we deploy several of them here. In particular, we use them to corroborate our error correction efforts and to shed light on a host of deeper governance questions--including whether legal origins matter and how governance evolves during periods of disruption like the Financial Crisis. The emergent scholarly literature on "common ownership" can also benefit from the CCG. (18) While this literature raises troubling questions about whether large passive investors are conduits for anticompetitive behavior, its proponents still struggle to pin down the precise mechanism through which passive ownership translates into conscious parallelism. (19) The CCG provides an intriguing tool for smoking out such a mechanism (if one exists) by dusting for fingerprints left at the scene of the crime, as manifested in stockholder rights and governance structures in our corpus. Similarly, the CCG can help reveal how governance shapes (and is shaped by) the very purpose of the corporation itself, particularly as scholars and policy makers take the concept of stakeholder governance more seriously. (20) Preexisting governance metrics--which tend to focus exclusively on shareholder interests--have little to say about this topic, but the CCG is a ready resource for generating new measures that bear directly on non-shareholder constituencies.

More broadly, this Article exposes two systemic issues that should concern empirical researchers of all stripes. The first is that corporate governance research has a critical need for lawyers and lawyerly judgment. We conjecture that a principal reason that data errors have propagated for so long in this field is that lawyers were exiled (or relegated themselves) to the back seat of the data aggregation project. In their absence, non-lawyers were left to do much of the work, proceeding--as best they could--to dispense judgments about the effects of formal legal documents, statutes, case law, and the like. While perhaps a commendable first effort, such casual empiricism no longer suffices. Lawyers can and must play a more central role in empirical corporate governance research, reclaiming the function for which they are professionally trained.

Second, our enterprise underscores the seemingly banal observation that data availability matters. A lot. Another likely reason for poor data quality in this area is that corporate governance documents are surprisingly difficult to collect, organize, and analyze. Many notable jurisdictions (such as Delaware) actively throttle public access to their rich documentary trove, tossing in exorbitant access fees for good measure. (21) Federal regulators (such as the SEC) provide several governance documents for free, but only in highly disorganized form. (22) And the few private enterprises that have attempted to organize them also protect their creations aggressively with paywalls, user restrictions, and ominous litigation threats. (23) Although the CCG partially frees the next generation of corporate governance scholars from these restraints, we nonetheless join with others (in law and elsewhere) in calling for better and less restrictive public access to public documents. (24)

The remainder of this Article proceeds as follows. Part I assesses the most important empirical corporate governance studies to date, and the role of the most critical datasets within them. We also observe that because of the prohibitive challenges in obtaining underlying textual data, most researchers have relied on commercial third-party sources. Part II describes our research design and data collection protocols, providing a descriptive snapshot of the size, reach, and scope of the CCG. We then demonstrate that corporate charters are highly dynamic documents, amended with increasing frequency. (25) Yet they have also progressively become more "lawyered," growing longer, more technical, and less readable than their forebears of a quarter century ago. More provocatively, this Part uses the CCG to document the alarming inaccuracy of prominent corporate governance indices, showing that even one of best-known results in the field attenuates considerably in the presence of cleaned data. Part III explores important future uses of the CCG, including its ability to generate novel insights about the state and evolution of corporate charters. Among other things, we illustrate how the database lends itself to a wide variety of emergent computational and machine learning techniques, spotlighting several applications. Part IV discusses the broader implications of our study, situating it within the larger enterprise of empirical legal studies. A final section concludes. (26)

  1. THE STATE OF PLAY IN EMPIRICAL CORPORATE GOVERNANCE RESEARCH

    This article puts forward, for the first time, a clean, open-source, researchable corpus of corporate charters--the documentary DNA of corporate governance. But before proceeding to describe the CCG database itself, it is important to underscore why this data resource is so important. While there are many moving parts, two forces predominate: supply and demand. We discuss each below, followed by a discussion of the practical constraints that face researchers who endeavor to collect raw corporate governance documents.

    1. Demand

      The field of law and finance is, in relative terms, extremely young. Until about twenty-five years ago, finance and business law researchers typically sailed on scholarly ships that...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT