Using item response theory to improve measurement in strategic management research: An application to corporate social responsibility

Date01 January 2016
AuthorRobert J. Carroll,Brian K. Richter,David M. Primo
Published date01 January 2016
DOIhttp://doi.org/10.1002/smj.2463
Strategic Management Journal
Strat. Mgmt. J.,37: 66–85 (2016)
Published online EarlyView in WileyOnline Library (wileyonlinelibrary.com) DOI: 10.1002/smj.2463
Received 29 August 2013;Final revisionreceived 23 January 2015
USING ITEM RESPONSE THEORY TO IMPROVE
MEASUREMENT IN STRATEGIC MANAGEMENT
RESEARCH: AN APPLICATION TO CORPORATE
SOCIAL RESPONSIBILITY
ROBERT J. CARROLL,1DAVID M. PRIMO,2and BRIAN K. RICHTER3*
1Department of Political Science, Florida State University, Tallahassee, Florida,
U.S.A.
2Department of Political Science and Simon Business School, University of
Rochester, Rochester, Ne w York, U.S.A.
3Business, Government, & Society Department, McCombs School of Business,
University of Texas, Austin, Texas, U.S.A.
Research summary: This article uses item responsetheory (IRT) to advance strategic management
research, focusing on an application to corporate social responsibility (CSR). IRT explicitly
models rms’ and individuals’ observable actions in order to measure unobserved, latent
characteristics. IRT models have helped researchers improve measures in numerous disciplines.
To demonstrate their potential in strategic management, we show how the method improves on
a key measure of corporate social responsibility and corporate social performance (CSP), the
KLD Index, by creating what we term D-SOCIAL-KLD scores, and associated estimates of their
accuracy, from the underlying data. We show, for instance, that rms such as Apple may not be
as “good” as previously thought, while rms such as Walmart may perform better than typically
believed. We also show that the D-SOCIAL-KLD measure outperforms the KLD Index and factor
analysis in predicting new CSR-related activity.
Managerial summary: Corporate social responsibility (CSR) continues to grow in importance
among the press, political activists, managers,analysts, and investors, yet measurement techniques
have not kept up. We show that the most common approach for measuring CSR— adding up
observable traits— is fundamentally awed, even if these traits accurately capture CSR-related
behavior.Weintroduce an improved measurement technique that treatsthese traits as test questions
that are differentiallyweighted, so that “hard” CSR activities affect a company’s score more than
“easy” CSR activities. This approach producesa measure that offers a more reliable comparison
of rms than standard measures. Our approachhas a number of additional advantages, including
differentiating rms that receive identical scores on an additive scale and accounting for how
CSR-related behavior has evolved overtime. Anybody who cares about CSR should consider using
our measure (available at www.socialscores.org) as the basis for analyzing rms’ CSR. Copyright
© 2015 John Wiley & Sons, Ltd.
INTRODUCTION
The core challenge to measurement in strategic
management contexts is that, unlike in the physical
Keywords: measurement; item response theory; Bayesian
estimation; corporate social responsibility; corporate
social performance
*Correspondence to: Brian K. Richter, Mailing: 2110 Speed-
way, B6500, CBA 5.250, Austin, TX 78712-0177. E-mail:
brian.richter@mccombs.utexas.edu
Copyright © 2015 John Wiley & Sons, Ltd.
sciences, the rm-level and individual-level char-
acteristics we would like to measure are often
inherently impossible to observe directly (Godfrey
and Hill, 1995). For example, how can we deter-
mine, in an objective manner, how well-governed
(e.g., Aguilera and Jackson, 2003; Daily, Dalton,
and Cannella, 2003; Shleifer and Vishny, 1997),
entrepreneurial (e.g., Covin and Slevin, 1991;
Lumpkin and Dess, 1996), or socially responsible
(e.g., Carroll, 1979) a given rm really is? The
An Application to Corporate Social Responsibility 67
challenge is so great that poor measurement has
been called “one of the most serious threats to
strategic management research” (Boyd, Gove, and
Hitt, 2005). This article shows how researchers
can use item response theory (IRT) modeling
to improve measurement; we demonstrate the
usefulness of IRT with an application to corporate
social responsibility/performance (CSR/CSP).
Researchers often construct measures built from
multiple observable proxies using either additive
indices or data reduction techniques such as factor
analysis (Boyd et al., 2005). These approaches
have several benets compared to the use of a
single proxy; for instance, they make use of more
information and reduce measurement error that
might arise from one noisy signal. However,
they also have serious drawbacks. The implicit
assumption underlying the construction of additive
indices, for instance, is that each observable is
an equally good proxy of the underlying attribute
we hope to measure. This, of course, is a strong
assumption that is difcult to justify theoretically,
yet additive indices are used in a variety of contexts,
including CSR/CSP (the focus of this article) and
the “G-index,” which is used to measure the quality
of corporate governance (Aguilera and Desender,
2012). While an improvement over additiveindices,
scales based on data reduction techniques like fac-
tor analysis— the rm-level “entrepreneurial
orientation (EO) scale” (Lyon, Lumpkin, and
Dess, 2000) being a prominent example— are not
as exible as the IRT approach we introduce in
this article.
IRT MODELS
Item response theory (IRT) models can improve on
existing “state of the art” measurement techniques
by generating measures of latent characteristics
based on a richer, theory-driven understanding of
how these characteristics are reected in proxies.
In doing so, IRT models enable the researcher
to assess important questions. Are differences
between individuals and rms in traditional
measures of latent characteristics real or due to sys-
tematic measurement error (which can be estimated
for IRT-based measures)? How do individual rms
and groups of rms change over time? Are some
items in an index better/worse at distinguishing
among rms, and if so, by how much?
The data inputted into an IRT model for
estimation of latent traits may be a set of responses
to a series of questions or a set of other observed
measures, such as whether various behaviors
occurred or did not occur.1Extrapolating from an
education setting, these observables can be thought
of as answers to test questions, following Thurstone
(1925), who had the insight that students of varying
ability levels respond differently to various test
questions, which themselves vary in how well they
measure ability (Bock, 1997). Hence, IRT models
simultaneously assess both the test questions and
the test takers.
We focus here on a basic two-parameter model
for binary (e.g., yes-no; absent-present; 0– 1;
correct-incorrect) data. IRT models can also
accommodate ordinal responses (e.g., a rating
on a scale of 1– 5) and additional parameters. In
the article’s conclusion, we discuss how man-
agement researchers can take advantage of this
exibility.
The basic model takes the following form:
Pr(yi,j=1|𝜌i,𝛼j,𝛽j)=F(𝛼j+𝛽j𝜌i). The isub-
script refers to individual respondents, while the
jsubscript refers to the items used to assess those
respondents. F(·) is typically the logistic or stan-
dard normal function, making this formula similar
to a logit or probit model when working with binary
data (Hoetker, 2007); a key difference between
applications of those techniques and IRT models,
however, is that in IRT there is typically no inde-
pendent variable with observed data (i.e., xi); rather,
it is replaced by the 𝜌iterm representing ability
(or another latent trait) that the researcher wishes
to estimate. The outputs of a basic two-parameter
model are estimates of the latent trait for each
individual in the dataset (𝜌i), along with estimates
for how difcult each item is (𝛼j) and how well each
item discriminates among individuals (𝛽j). Using
a test analogy, 𝛼jaddresses the question “Holding
ability xed, how likely is a student to get question j
correct?”, and 𝛽jaddresses the question “How well
does question jhelp distinguish between students
of different ability levels?”; in other words, do indi-
viduals with high ability and low ability (i.e., high
and low 𝜌is) differ in the probability they will get a
question correct?
IRT models have deep roots in psychology (Lord
and Novick, 1968; Rasch, 1960; Reise and Waller,
1The discussion in this section draws from Johnson and Albert
(1999), and Fox (2010).
Copyright © 2015 John Wiley & Sons, Ltd. Strat. Mgmt. J.,37: 66–85 (2016)
DOI: 10.1002/smj

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT