The USPTO Patent Examination Research Dataset: A window on patent processing

AuthorRichard Miller,Stuart J.H. Graham,Alan C. Marco
Published date01 September 2018
Date01 September 2018
DOIhttp://doi.org/10.1111/jems.12263
Received: 15 March 2017 Revised: 10 May 2018 Accepted: 10 May 2018
DOI: 10.1111/jems.12263
ORIGINAL ARTICLE
The USPTO Patent Examination Research Dataset: A window on
patent processing
Stuart J.H. Graham1Alan C. Marco2Richard Miller3
1Scheller College of Business, GeorgiaInsti-
tute of Technology,Atlanta, GA, USA
2School of Public Policy,Georgia Insti-
tute of Technology,Atlanta, GA, USA
(Email: amarco@gatech.edu)
3United States Patent and TrademarkOffice,
Office of Chief Economist, Alexandria, VA,
USA (Email: richard.miller@uspto.gov)
Correspondence
Stuart J.H. Graham, SchellerCollege of Busi-
ness,Georgia Institute of Technology, 800 W.
PeachtreeSt. NW, Atlanta, GA 30308 USA.
Email:Stuar t.Graham@scheller.gatech.edu
Theviews expressed are those of the individual
authors and do notnecessar ilyreflect official
positionsof t he Office of Chief Economistor
the U.S.Patent and Trademark Office. Both
Grahamand Marco were employees of USPTO
during the writing of this manuscript. We thank
RobertKimble for data parsing and coding.
Abstract
This article describes the “USPTO Patent Examination Research Dataset” (PatE x)
and explores possible selection issues and the representativeness of the nearly 9.2
million US patent application records it contains. We find that data are sparse for years
before 1981, and that serious selection issues affect records on applications filed prior
to 2001 due to nonpublication in the United States. Following implementation of a
policy change in November 2000, both coverage and representativeness of the PatE x
data improve substantially. We uncover specific areas that are prone to selectivity
issues, by generating statistical evidence across application characteristics such as
application type, age, ownership type, domestic or foreignor igin, patent familystatus,
and technology class among others. Although our exploration suggests to researchers
several categories of specific concern, our findings overall show that the PatEx data
are generally representative of the population of patent applications filed in the United
States after November 2000 across observable characteristics.
KEYWORDS
innovation research, patent examination, patents
JEL CLASSIFICATION:
O3, C1, C5, H1, K2, L1, Y1
1INTRODUCTION
The recognition that technology and innovation are important drivers of economic growth (Solow, 1956) has spawned a large
body of research in economics and management using patent data. Although early path-breaking work relied on relatively sim-
ple patent counts (Schmookler, 1966), subsequent studies used increasingly detailed patent information, generating insights by
delving into the fees patentees paid (Schankerman & Pakes, 1986), the references patents made to one another (Trajtenberg,
1990), and the technology classes patents covered (Lerner, 1994). More recently, researchers have become even more sophisti-
cated, by focusing on how these data were generated in the US Patent and Trademark Office (USPTO, or Office) during patent
examination. Thompson and Fox-Kean (2005) used data and insights from the examination process to show that the methodol-
ogy used in a prior, influential study on knowledge spillovers (Jaffe, Trajtenberg, & Henderson, 1993) produces spurious results.
While improved matching methods drove Thompson and Fox-Kean's main results, the authors explained that failing to account
for how technology classes and citations are assigned to patents by the USPTO introduces considerable bias. In another study,
Alcacer and Gittelman (2006) explored how references are assigned byUSPTO examiners to patents. While scholars commonly
used patent-to-patent references to represent knowledge flows among firms (Almeida, 1996; Rosenkopf & Nerkar, 2001; Singh,
2004), Alcacer and Gittelman showed that a large share of referenceswere “examiner added,” did not reflect interfirm knowledge
transfers, and produced biased results and overinflated significance levels. By scrutinizing how patent information is generated
554 © 2018 Wiley Periodicals, Inc. J Econ Manage Strat. 2018;27:554–578.wileyonlinelibrary.com/journal/jems
GRAHAM ET AL.555
inside the USPTO, during the examination process prior to patent grant, scholars have continued to enrich our understanding,
uncover possible sources of bias, and promote more robust scholarship.
Following in this tradition, this article provides a research guide to the “USPTO Patent Examination ResearchDataset” (Pa tEx
or Dataset).1PatE x isa research-ready relational data set made public by the USPTO Office of Chief Economist (OCE), provid-
ing a wealth of microlevel administrativedata on US patents, patent applications, and their examination histories, dating from the
1910 to present.2Although the Pat Ex data are updated and made available by the Office each year, this article describes records
for US applications filed through December 31, 2014, covering about 5.2 million granted patents, 9.2 million applications, and
over 275 million coded examination records related to those patents and applications.
The PatE x data are drawn from records in the Public Patent Application Information Retrieval (PublicPAIR) system, an
information source created by USPTO in the wake of the US Congress passing the American Inventors Protection Act (AIPA),
which changed the default rule for publishing US patent applications to 18 months after first filing date, starting in late 2000.3
PublicPAIR wasdesigned to allow interested parties (e.g., competitors of the patent applicant, other inventors) to access informa-
tion about individual published US applications, but was available only on a case-by-case basis—creating barriers to empirical
researchers interested in collecting data for analysis. The implications of AIPA are relevant for users of PatEx because the law
both dramatically increased the share of previously unpublished patent applications that became observable after November
2000, yet included an opt-out provision, meaning that proper use of PatEx requires that researchers are aware of the selectivity
issues inherent in these data.
Our goals in writing this article are twofold. First, to provide PatEx users wit h a detailed understanding of the database, and
better insights into the meaning and significance of the various data elements. Second, to deliver an analysis for researchers
of what, if any, selection bias has been introduced by the rules that govern when (and indeed whether) the USPTO may make
records available to the public. To explore possible selectivity issues in the PatEx data, we compare these data to another
information source available only internally at the USPTO, the Patent Application Locating and Monitoring (PALM) Sys-
tem. Because PALM is the primary system used by USPTO examiners to process patent applications, it includes compre-
hensive information on the population of filed US patent applications. PALM thus provides us with a data source against
which we can compare the PatE x data to examine possible biases inherited from the PublicPAIR source from which PatEx was
built.
Fortunately, as USPTO associates during the drafting of this article, we had access to PALM, and were thereforepr ivilegedto
obtain information not normally available to researchers, and also internal USPTO expertise that allowed us to understand, and
better document, our discoveries. Moreover, policies supported greater transparency and deeper engagement with researchers,
and the public at large (Graham & Hancock, 2014). Our aims also complement what the history of patent-data use in economics
and management scholarship has taught: that research benefits from a deeper understanding of the foundations of these data,
and awareness of potential sources of bias.
The article is organized as follows.Section 2 reviews the relevant prior literature that demonstrates the research utility of these
data. Section 3 provides a brief description of the Pat Ex data files and the USPTO information sources used to construct Pat Ex,
and reviewsthe r ules dictating how USPTO records become public, and howthese selection mechanisms have changed over time.
Section 4 provides analysis of PatEx coverage, and generates statistical evidence regarding the scale and scope of PatE x selection
issues. Because publishing US patent applications became common only after November 2000, we focus particularly on issues
arising from the nonpublication US applications, both before and after this date. Section 5 investigates selection issues as they
relate to modeling the examination process with the transactions history information available in PatEx . Section 6 summarizes
our findings and provides some concluding thoughts.
2PRIOR RESEARCH
Although patent data have been employedin economics and management research for decades, only recently havescholars delved
deeply into the foundations of these data, and inquired into how these administrativedat a are produced inside the USPTO. Tradi-
tionally, although patent data were used in sophisticated empirical models, fewresearchers peered into the “black box” of patent
examination. This began to change in the 2000s when scholarship—including Thompson and Fox-Kean (2005) and Alcacer
and Gittelman (2006) reviewed in the introduction—began to explore and draw insights from patent examination. Cockburn,
Kortum, and Stern (2003) conducted a detailed economic analysis of USPTO examiner attributes, relating these to patent charac-
teristics and litigation outcomes. They reported heterogeneity and substantial variation among examiners and examinationalong
observables, such as tenure, workloads, and (importantly, given the way patent references are used in the literature) subsequent
citation by other patents. Gans, Hsu, and Stern (2008) used variation in delay in USPTO examination to study how uncertainty

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT