Text of Trade Agreements (ToTA)—A Structured Corpus for the Text‐as‐Data Analysis of Preferential Trade Agreements

AuthorDmitriy Skougarevskiy,Wolfgang Alschner,Julia Seiermann
Date01 September 2018
DOIhttp://doi.org/10.1111/jels.12189
Published date01 September 2018
Journal of Empirical Legal Studies
Volume 15, Issue 3, 648–666, September 2018
Text of Trade Agreements (ToTA)—A
Structured Corpus for the Text-as-Data
Analysis of Preferential Trade Agreements
Wolfgang Alschner, Julia Seiermann, and Dmitriy Skougarevskiy*
With multilateral negotiations at the World Trade Organization (WTO) in deadlock, rule-
making on international economic governance has shifted to preferentialtrade agreements
(PTAs). To facilitate the scholarly investigation of the fast-growing universe ofPTAs, this arti-
cle introducesa machine-readableand structured full text corpus of 448WTO-notified trade
agreements stored on a Github repository—the Text of Trade Agreements (ToTA) corpus.
The article (1) provides a summary analysis of the ToTA corpus, (2) illustrates how text-as-
data techniques can be used to investigate PTA design using ToTA, including through an
interactive website accompanying this research, and (3) concludes with an overview of
research applications involving this PTA textcorpus in economics, political science,and law.
The current codebookis attached herein as an appendix. The dataset, codebook, and code,
as updated, are available at https://github.com/mappingtreaties/tota.
I. INTRODUCTION
The multilateral rules governing international trade today stem from the early 1990s
when the World Trade Organization (WTO) was created. Since then, the world has seen
unprecedented changes in international trade driven by fast-paced technological innova-
tions, including the creation of the Internet. These enabled the emergence of global
value chains that spread production processes once bundled together in a single factory
across different countries or even continents and have more recently given rise to an
important increase in digital trade (Baldwin 2016). With the WTO’s Doha Round negoti-
ations in continuous deadlock since 2001, multilateral rules have not kept pace with
those developments and are at risk of becoming outdated. States have instead turned to
*Direct correspondence to Dmitriy Skougarevskiy, Lead Researcher, Institute for the Rule of Law, European Uni-
versity at St. Petersburg, Gagarinskaya ul 6/1, 191187 St. Petersburg, Russia; email: dskougrevskiy@eu.spb.ru. Alsch-
ner is Assistant Professor in the Common Law Section, University of Ottawa; Seiermann is Associate Economic
Affairs Officer at the U.N. Conference on Trade and Development (UNCTAD).
The views expressed in this article are those of the authors and do not necessarily reflect the views of the
United Nations or its officials or member states. We thank Veronika Zhirnova and Kseniia Tumasova for research
assistance and gratefully acknowledge the funding support of the Swiss National Science Foundation (SNSF) pro-
ject “Convergence Versus Divergence? Text-as-Data and Network Analysis of International Economic Law Treaties
and Tribunals” (Grant Number 162379).
648
preferential trade agreements (PTAs) to govern commerce in the 21st century. Until
October 2017, WTO member states had notified close to 450 PTAs to the WTO. With
PTAs becoming the laboratories of modern trade rulemaking, their importance is likely
to further increase in the future.
The study of PTA content has evolved to accompany their growing importance.
When PTAs were fewer in number and primarily concerned tariff reductions, researchers
accounted for their effects based on whether or not a PTA existed or through typologies
that mapped PTAs along their level of economic integration from a simple free trade
agreement to a deep economic union (Baier et al. 2014). Yet, as PTAs became more com-
plex, covering not only trade in goods, but also trade in services and extending to new
subject areas such as e-commerce, investment protection, or competition policy, more
sophisticated means to capture varying PTA content were developed to account for the
increasing heterogeneity. The Design of Trade Agreements (DESTA) project hand-coded
PTAs across 100 content dimensions (Du
¨r et al. 2014). Ruta et al. coded PTAs for the
inclusion of 52 policy areas and their legal enforceability (Ruta et al. 2017) based on a
classification initially proposed by Horn et al. (2010). Similarly, Kohl et al. (2016)
mapped agreements along 17 trade-related policy domains and accounted for their
enforceability.
These initiatives have significantly improved our understanding of the content of
the PTA universe. However, hand-coded databases can be complemented and augmented
by another rich source of information—the actual text of PTAs. Whereas hand-coded
data allow researchers to investigate selected features of interest, corpus analytics treats
treaty texts as data and can provide both a holistic bird’s-eye view of the content of PTAs
as well as an ability to zoom in to extract information at the word, sentence, or clause
level. The availability of a text corpus enables researchers to use analytical methods that
cannot be applied to hand-coded data. One such example is textual similarity, which,
similar to plagiarism detection, allows to gain insights on how much of one PTA text is
identical to another, and can be used to study questions related to the diffusion of treaty
design and regulatory consistency. Other applications include machine-learning tech-
niques, which leverage the high dimensionality of textual data for prediction and classifi-
cation tasks (for a detailed discussion, see Gentzkow et al. 2017). Although some studies
have leveraged text-as-data methods, in particular textual similarity, to investigate subsets
of the PTA universe (Allee & Lugg 2016; Allee et al. 2017a), the lack of a textual infra-
structure has so far hampered large-scale computational studies of PTA texts.
The Text of Trade Agreements (ToTA) corpus introduced in this article seeks to
fill this gap. It makes a machine-readable and structured full-text corpus of PTAs notified
to the WTO publicly available on a Github repository to serve as common infrastructure
for a wide range of future research endeavors. Aside from facilitating traditional content
analysis through manual coding, it provides a much needed basis for text-as-data investi-
gations of PTAs. This article introduces the ToTA text corpus and explains the underly-
ing data and structure that enable targeted content analysis. It will also showcase a set of
computational techniques that have already been deployed on the corpus and have been
made available on a parallel interactive website to illustrate the types of analyses made
possible by ToTA. Finally, the article concludes by identifying a set of research questions
Text of Trade Agreements 649

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT