Reliance on science: Worldwide front‐page patent citations to scientific articles

DOIhttp://doi.org/10.1002/smj.3145
Date01 September 2020
Published date01 September 2020
AuthorMatt Marx,Aaron Fuegi
RESEARCH ARTICLE
Reliance on science: Worldwide front-page
patent citations to scientific articles
Matt Marx
1
| Aaron Fuegi
2
1
Boston University Questrom School of Business, Boston, Massachusetts
2
Boston University Information Services & Technology, Boston, Massachusetts
Correspondence
Matt Marx, Boston University Questrom
School of Business, 595 Commonwealth
Avenue, Boston, MA 02215, room 643A.
Email: mattmarx@bu.edu
Funding information
National Science Foundation, Grant/
Award Number: 1735669
Abstract
Research summary:To what extent do firms rely on
basic science in their R&D efforts? Several scholars
have sought to answer this and related questions, but
progress has been impeded by the difficulty of matching
unstructured references in patents to published papers.
We introduce an open-access dataset of references from
the front pages of patents granted worldwide to scien-
tific papers published since 1800. Each patent-paper
linkage is assigned a confidence score, which is charac-
terized in a random sample by false negatives versus
false positives. All matches are available for download
at http://relianceonscience.org. We outline several ave-
nues for strategy research enabled by these new data.
Managerial summary:To what extent do firms rely
on basic science in their R&D efforts? Several scholars
have sought to answer this and related questions, but
progress has been impeded by the difficulty of matching
unstructured references in patents to published papers.
We introduce an open-access dataset of references from
the front pages of patents granted worldwide to scien-
tific papers published since 1800. Each patent-paper
linkage is assigned a confidence score, and we check a
random sample of these confidence scores by hand in
order to estimate both coverage (i.e., of the matches we
Received: 8 March 2019 Revised: 10 January 2020 Accepted: 14 January 2020 Published on: 16 April 2020
DOI: 10.1002/smj.3145
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and
reproduction in any medium, provided the original work is properly cited.
© 2020 The Authors. Strategic Management Journal published by John Wiley & Sons, Ltd.
1572 Strat. Mgmt. J. 2020;41:15721594.wileyonlinelibrary.com/journal/smj
should have found, what percentage did we find) and
accuracy (i.e., of the matches we found, what percent-
age are correct). We outline several avenues for strategy
research enabled by these new data.
KEYWORDS
basic science, dataset article, patent citations, research &
development
1|INTRODUCTION
This paper details the construction of a publicly-available set of citations from U.S. patents
(19472018) and non-U.S. patents (17822018) to scientific articles (18002018). We establish
approximately 22 million patent citations to science. The patent-paper linkages, as well as
selected metadata on the articles (whether cited or not), and the source code are publicly avail-
able at http://relianceonscience.org.
Patent citations to science (hereafter, PCS) are of interest to strategy researchers who seek to
understand innovation in firms: the nature of research and development, how inventors and sci-
entists search for commercializable basic science, and the process by which university inventions
are exploited by firms. Despite these advantages, PCS have only sometimes been used in strategy
research, for at least two reasons. First, PCS are difficult to work with given that they appear in
patent records as unstructured text strings. Thus researchers must either match patents and sci-
entific articles by hand (for small samples) or (for large samples) build algorithms that are possi-
bly error prone. Second, even when research teams have invested the effort to link patents and
scientific articles at scale, they have typically done so using proprietary databases such as Scopus
or the Web of Science. Thus the matched PCS cannot be shared with other research teams, who
must license the databases for themselves and/or develop algorithms from scratch.
As other research teams have (Gaetani & Li Bergolis, 2015; Fleming, Greene, Li, Marx, &
Yao, 2019), we link data from the U.S. Patent & Trademark Office to a broad set of scientific arti-
cles not limited by industry or field. Our linkages involve not only proprietary article databases,
which cannot be shared, but also a newly-available, open-source database from Microsoft (Sinha
et al., 2015) which permits us to post the resulting PCS for public use. Based on third-party
assessment, we estimate that our algorithm can capture up to 93% of patent citations to science
with an accuracy rate of 99% or higher. We believe this to be the longest panel of patent-to-paper
citations (spanning more than seven decades) that is publicly available and is accompanied by
rigorous performance metrics. We also provide matches fromworldwide patents to PubMed.
The paper is organized as follows. We begin by motivating the use of PCS in strategy research and
review prior approaches. Second, we detail our patent-paper linking algorithm. Third, we describe
both the private and publicly-available data products as well as our methods for assessing their effi-
cacy. We conclude by sketching research avenuesopenedupbythebroadavailabilityofPCS.
2|MOTIVATION
Innovation is a key source of sustainable differentiation for firms and thus a longtime focus of
strategy researchers. The lottery-based nature of research & development (R&D) has long
MARX AND FUEGI 1573

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT