Evaluating the Quality of Changes in Voter Registration Databases

Published date01 November 2020
Subject MatterArticles
American Politics Research
2020, Vol. 48(6) 670 –676
© The Author(s) 2019
Article reuse guidelines:
DOI: 10.1177/1532673X19870512
Voter files are an important resource for political research and
also crucial for the integrity of the administration of elections,
as they dictate who votes. The files constantly change—but
not all changes are intentional or welcome. External intru-
sions into voter files became a salient issue in the 2016 presi-
dential election (Sanger, 2018). Media and federal officials
reported that foreign actors attempted to access voter data in
various states and that they may have tried to alter registration
data to manipulate election outcomes or undermine public
trust (Fandos & Wines, 2018; Perlroth, Wines, & Rosenberg,
2017). However, internal quality deterioration can sometimes
occur because voter files are large, dynamic, and complex.
For example, in the 2018 June primary election in California,
records for as many as 77,000 voters in the state’s system
were duplicated inadvertently by the Department of Motor
Vehicles; in the 2018 primary election in Los Angeles County,
118,000 voters were left off precinct rosters due to a merge
error (Myers, 2018; Reyes & Smith, 2018).
While election officials work tirelessly to guard against
cyberattack and human error, there are calls for independent
auditing of voter files and generally to improve their quality
(Alvarez, Ansolabehere, & Stewart, 2005; Alvarez, Jonas,
Winkler, and Wright, 2009; Ansolabehere & Hersh, 2010).
Past studies of voter data quality have been static in their
focus, as in Ansolabehere and Hersh (2014).1 However, as we
have seen, data quality can sharply change over time, present-
ing problems both to election administrators and to scholars
using the data. In this article, we present two methods that
evaluate the internal validity of voter registration data as it
changes over time, which increases assurance of voter file
quality and provides interesting data for election scholars, as a
novel source of data on election administration practices.
Collaborating with the Orange County Registrar of Voters
(OCROV), our first approach matches voter snapshots at dif-
ferent points in time and quantifies the changes to the file. As
voter files are dynamic, some rate of change is expected, due
to new registrations, residential mobility, deceased voters,
and changes in personal information. The resulting time-
series of changes can undergo statistical anomaly detection
to find anomalous changes, which represent those that depart
from the expected rate of change. While this approach can
provide notification of a sudden deterioration of database
quality, we argue that the generated audit data can also be of
scholarly interest, serving as an important source of informa-
tion on election administration, a rare window into this
important—and often overlooked—component of the demo-
cratic process. Our second approach is a duplication detec-
tion scheme which provides a list of potential duplicates with
a principled, automated approach, while minimizing cases
where the election official might accidentally delete a valid,
nonduplicate voter. Combined with voter file information on
how the registration data were generated, we show that we
1California Institute of Technology, Pasadena, USA
Corresponding Author:
Seo-young Silvia Kim, California Institute of Technology, HSS, Caltech,
1200 East California Blvd., Pasadena, CA 91125-0002, USA.
Email: sskim@caltech.edu
Evaluating the Quality of Changes in
Voter Registration Databases
Part of Special Symposium on Election Sciences
Seo-young Silvia Kim1, Spencer Schneider1,
and R. Michael Alvarez1
The administration of elections depends crucially upon the quality and integrity of voter registration databases. In
addition, political scientists are increasingly using these databases in their research. However, these databases are
dynamic and may be subject to external manipulation and unintentional errors. In this article, using data from Orange
County, California, we develop two methods for evaluating the quality of voter registration data as it changes over
time: (a) generating audit data by repeated record linkage across periodic snapshots of a given database and monitoring
it for sudden anomalous changes and (b) identifying duplicates via an efficient, automated duplicate detection, and
tracking new duplicates and deduplication efforts over time. We show that the generated data can serve not only to
evaluate voter file quality and election integrity but also as a novel source of data on election administration practices.
voter registration data, record linkage, election integrity, election administration, data quality

