New Estimates of Over 500 Years of Historic GDP and Population Data

Published date01 April 2022
Date01 April 2022
AuthorJonathan N. Markowitz,Therese Anders,Christopher J. Fariss,Miriam Barnum
DOI10.1177/00220027211054432
Subject MatterData Set Feature
Data Set Feature
Journal of Conf‌lict Resolution
2022, Vol. 66(3) 553591
© The Author(s) 2022
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/00220027211054432
journals.sagepub.com/home/jcr
New Estimates of Over
500 Years of Historic GDP
and Population Data
Christopher J. Fariss
1
, Therese Anders
2
,
Jonathan N. Markowitz
2
, and Miriam Barnum
2
Abstract
Gross domestic product (GDP), GDP per capita, and population are central to the
study of politics and economics broadly, and conf‌lict processes in particular. Despite
the prominence of these variables in empirical research, existing data lack historical
coverage and are assumed to be measured without error. We develop a latent variable
modeling framework that expands data coverage (1500 AD2018 AD) and, by making
use of multiple indicators for each variable, provides a principled framework to es-
timate uncertainty for values for all country-year variables relative to one another.
Expanded temporal coverage of estimates provides new insights about the relationship
between development and democracy, conf‌lict, repression, and health. We also
demonstrate how to incorporate uncertainty in observational models. Results show
that the relationship between repression and development is weaker than models that
do not incorporate uncertainty suggest. Future extensions of the latent variable model
can address other forms of systematic measurement error with new data, new
measurement theory, or both.
Keywords
gross domestic product, population, GDP per capita, latent variables, measurement,
construct validity
1
Department of Political Science, University of Michigan, Ann Arbor, MI, USA
2
School of International Relations, University of Southern California, Los Angeles, CA, USA
Corresponding Author:
Christopher J. Fariss, Department of Political Science, University of Michigan, 426 Thompson Street, Ann
Arbor, MI 48109-1382, USA.
Email: cjf0006@gmail.com
Introduction
Gross domestic product (GDP), GDP per capita (GDPPC), and population play a vital
role in empirical social science. Moreover, they are key variables in the study of
international and domestic conf‌lict processes in particular. Despite the prominence of
these variables, existing data used to operationalize them suffers from three problems:
(1) measurement coverage (missingness), (2) measurement uncertainty, and (3)
measurement bias. First, there is a lack of economic cross-country data coverage prior
to 1950 (Gleditsch 2002), or data are available with very limited temporal coverage
even though many other datasets of interest cover variables beginning in the 1800s.
1
This leaves researchers unable to consistently estimate the relationship between key
variables such as economic development and democratization, conf‌lict, repression, or
health prior to this date. Second, measurement error arises because of imprecision or
disagreement in available data, which can mask real relationships or exacerbate false
ones. Though scholars are aware that data suffers from measurement error, existing
estimates of these variables provide only point estimates and offer no method for
quantifying uncertainty (measurement error). Third, existing models for estimating
GDP,populatio n, and GDP per capita offer no way to correct for any systematic bias in
the data generating process. As previous research has demonstrated, failing to correct
for measurement bias leads scholars to draw incorrect inferences about the relationships
in their data. We develop a latent variable model of GDP, GDP per capita, and
population that provides remedies for each of these issues.
2
First, the latent variable model generates estimates of cross-national data coverage
back in time by several centuries to 1500. These estimates are important not only for
scholars assessingexisting explanationsof outcomes in earlier periods of history, but also
scholars interested in comparing inferences generated from data across time periods.
Given that most of the major interstate wars occurred before 1950, these data will be
particularly usefulto scholars studying the causes of war, as well as the debate about its
relative decline over time (e.g., Fazal 2014;Lacina, Gleditsch and Russett 2006). Until
now, researchers have been unable to estimate these long-term relationships.
For example, existing research argues that economic development reduces the risk
of conf‌lict both by contributing to the democratization process of states in the in-
ternational system and by making war more costly (Hegre 2000). As Souva and Prins
state, democratic regimes are about 37% less likely to initiate fatal militarized disputes
given an average level of GDP per capita(Souva and Prins 2006, 194). However, other
research argues that the relationship with economic development is curvilinear, for both
democratization (Treisman 2020) and conf‌lict (Boehmer and Sobek 2005;Gartzke
2012). To date, the relationships between economic development and democratization,
repression, or conf‌lict have not been estimated with complete cross-national data prior
to 1950. Scholars have instead relied on proxy-measures such as energy consumption
per capita (e.g., Markowitz, McMahon and Fariss 2019), shipping and rail costs (e.g.,
Lake 2009;Markowitz and Fariss 2013), or simple linear interpolation of GDP per
capita (e.g., Treisman 2020) to account for long-term economic variation. The new
554 Journal of Conf‌lict Resolution 66(3)
estimates we present in this article allow researchers to evaluate whether these em-
pirical relationships are limited to the post-1950 period, or whether they generalize to
earlier time periods (using more precise data with better coverage). We show that many
correlational patterns between GDP per capita and these other variables vary con-
siderably over time, which means that relationships for the post-1950 period do not
necessarily generalize to periods prior to 1950. Critically, this also suggests that these
relationships might change in the future.
Second, our new latent variable model provides estimates of the relative level of
uncertaintyfor each country-year unit by accountingfor variation in the level of coverage
within and disagreement between componentindicators. This is useful because it allows
researchers to evaluate whether measurement error in the data (expressed as the level of
uncertainty withwhich each country-year unit is accuratelymeasured) is large enough to
alter the size, or even the direction, of the effect of a given explanatory variable.
For illustration,we demonstrate that measurementuncertainty may be large enough to
change the estimated magnitude of the effect of development (GDP per capita) on
repression. If models do not incorporate uncertainty, then the researcher cannot rule out
the possibilitythat the statistical associations of GDP, GDP per capita,or population with
some othervariable are not a false-negativeresult (Type 2 erroror attenuation bias), orthe
related possibility that the relationship of these variables are a false-positiveresult (Type
1 error). In modelswith multiple indicators (i.e.,multiple-variable regression),bias due to
measurement error is not always attenuating if un-modeled, higher-order interactions
exist. Incorporating measurement uncertainty of variables addresses these diff‌icult-to-
model issues and can be further explored with non-parametric regression techniques. In
sum, incorporatinguncertainty into a regressionmodel provides evidencethat effect sizes
are probabilistically distinguishable from zero even when we cannot measure a right-
hand side variableperfectly (an assumptionof standard regression models).Though some
existing models that include these variables may have under-estimated an effect size,
others may have over-estimated effect size or even incorrectly reversed the direction of
the effect. The implications are potentially profound given the wide usage of these
variables across the social sciences.
Third, the measurement model provides a framework for incorporating theoretical
knowledge about the data generating process for information used in the measurement
of each variable that can correct for potential bias in the existing data. We describe how
the latent variable model we develop in this article can be further developed to address
measurement bias and point to other areas of scholarship that have successfully built
and extended latent variable models to address measurement bias.
3
The new estimates provide ample opportunities for future scholarship to re-visit
existing debates and investigate new research questions relating to war, peace, eco-
nomic development, and health and well-being, as well as the information necessary to
reduce bias due to measurement uncertainty in GDP, GDP per capita, or population
variables. The key output from the model is predicted intervals of the original source
variables in the original unit-of-measurement, in addition to the relative level of
uncertainty for each country-year estimate in the form of a standard deviation. The level
Fariss et al. 555

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT