Leveraging Big Data to Develop Supply Chain Management Theory: The Case of Panel Data

DOIhttp://doi.org/10.1111/jbl.12188
AuthorDaniel C. Ganster,Stanley E. Griffis,Jason W. Miller
Published date01 September 2018
Date01 September 2018
Leveraging Big Data to Develop Supply Chain Management
Theory: The Case of Panel Data
Jason W. Miller
1
, Daniel C. Ganster
2
, and Stanley E. Grifs
1
1
Michigan State University
2
Colorado State University
Increased data availability is poised to shape both business practice and supply chain management (SCM) research. This article addresses an
issue that can arise when trying to use big data to answer academic research questions. This issue is that distilled data often have a panel
structure whereby repeated measurements are available on one or more variables for a substantial number of subjects. Thus, to fully leverage
the richness of big data for academic research, SCM scholars need an understanding regarding the different types of research questions answer-
able with panel data. In this article, we devise a framework detailing different types of research questions SCM scholars can answer with panel
data. This framework provides a basis to categorize how SCM scholars have examined the services supply chain setting of health care with
public data regarding hospital-level patient satisfaction. We extend prior research by testing a series of three questions not yet examined in this
area by tting a series of structured latent curve models to seven years of hospital-level patient satisfaction for nearly 4,000 hospitals. The dis-
cussion highlights theoretical and methodological challenges SCM scholars are likely to encounter as they use the panel data in their research.
Keywords: big data; panel data; repeated measures; structured latent curve models
INTRODUCTION
One of the most important changes in the business landscape
is the increased availability of data and consequently the abil-
ity to incorporate these data when making decisions (Lohr
2010; McAfee and Brynjolfsson 2012; Waller and Fawcett
2013a; Schoenherr and Speier-Pero 2015). Although the term
big datahave various denitions (Arunachalam et al. 2018;
Tonidandel et al. 2018), most converge to the themes put for-
ward by Sanders (2014) that big data refer to very large quan-
tities of data such that, in their raw form, traditional data
handling approaches such as spreadsheets are no longer feasi-
ble. The dening characteristics of big data that make them
unique from data in the past are their volume, velocity, and
variety (Sanders 2016; Arunachalam et al. 2018). Companies
supply chain operations generate a tremendous amount of such
data in the form of global positioning system coordinates for
delivery vehicles, customer transaction data, and throughput
information (Sanders 2014). Consequently, the topic of big
data has increasingly drawn interest from supply chain man-
agement (SCM) scholars (Waller and Fawcett 2013a,b; Sanders
2014; Schoenherr and Speier-Pero 2015; Arunachalam et al.
2018).
One of the challenges SCM scholars face when trying to transi-
tion to using big data for research is that there is a substantial gap
between traditional methodological training, which tended to focus
on qualitative and survey-based techniques, and the skills neces-
sary to work with big data. In particular, the one challenge many
SCM scholars face is that once big data are distilled
1
into to a for-
mat more suitable for testing theoretical predictions, these data
may have a repeated measures
2
panel structure as in Massimino
et al. (2017). By panel structure, we mean data that consist of a
substantial number of subjects
3
measured on one or more variables
across several occasions (Baltagi 2008). Although panel data are
not big data per se, for SCM scholars unfamiliar with using panel
data, these data have characteristics associated with true big data
Corresponding author:
Jason W. Miller, Department of Supply Chain Management, Eli
Broad College of Business, Michigan State University, 632 Bogue
Street, N370, East Lansing, MI 48824, USA; E-mail:
mill2831@broad.msu.edu
1
The distillation of big data in a form usable for research may
be performed by the authors or via other entities. Two studies
where authors have performed this process themselves are Mas-
simino et al. (2017) and Mukherjee and Sinha (2017). In some
other instances, entities generating the data may compile these
data into a usable form, such as the Federal Motor Carrier Safety
Administration distilling data from millions of motor carrier
inspections into carrier-level aggregates that are disseminated to
the public (The Volpe Center 2016). In the case of the Toxics
Release Inventory data, third parties such as the Environmental
Defense Fund compiled vast quantities of data regarding chemi-
cal emissions to disseminate them in a more user-friendly fashion
(Fung and ORourke 2000).
2
We dene repeated measures data to mean that two or more
observations of the same metric(s) are taken from the same sub-
ject(s) with the measurement occasions separated by time (Crow-
der and Hand 1990). We avoid describing such data as
longitudinalbecause it is possible for scholars to have a cross-
sectional study design whereby the independent and dependent
variables are measured at different time points. Such a design is
inherently longitudinal in that there is a temporal separation
between the measures, but the data are not of the repeated mea-
sures type.
3
We adopt nomenclature from Cudeck and Harring (2007) and
use subjectas a generic term to refer to the unit of analysis for
which repeated observations are obtained.
Journal of Business Logistics, 2018, 39(3): 182202 doi: 10.1111/jbl.12188
© 2018 Council of Supply Chain Management Professionals
in that (1) volume (e.g., number of records) is substantially more
than encountered in most survey studies; (2) velocity (e.g., fre-
quency at which measures are taken) is greater than surveys, which
by and large are cross-sectional; and (3) variety may be greater
than seen with surveys (e.g., more reliance on proxies is neces-
sary). Thus, focusing on how to leverage panel data for research
can provide a rst step for scholars interested in developing the
skills to use big data while further strengthening researchers
applied use of big data given that the distillation process may
transform true big data into a panel structure to answer theoretical
questions. As such, the remainder of this article will focus on
issues pertaining to the use of panel data by SCM scholars with
the understanding that this can directly and indirectly strengthen
the use of big data for research purposes.
Although increased data availability is welcome, having access
to large quantities of data does not, ipso facto, result in scientic
advancement (Davis 2010). Data can only extend scientic
knowledge to the extent they are used to test hypotheses with
sound theoretical foundations (George et al. 2014; Davis 2015;
Barley 2016). To this end, a key impediment for SCM scholars
most fruitfully leveraging panel data is the absence of a clear,
didactic discussion regarding the array of research questions
panel data can address. In particular, we believe the divergent
perspectives regarding how to analyze panel data that exist, on
the one hand, in sociology (Halaby 2004) and economics (Balt-
agi 2008) and, on the other hand, biostatistics (Fitzmaurice et al.
2011) and quantitative psychology (Nesselroade 1991), could
stymie progress.
4
Our concern is that SCM scholars trained in
either research tradition may not appreciate the value, and thus
legitimacy, of answering questions central to the other.
The purpose of this article was to provide an overview for how
SCM scholars can leverage panel data to devise and test theories
that push the boundaries of our existing SCM knowledge, while
still articulating implications for practitioners. To pursue this goal,
we rst devise a framework detailing different types of research
questions that can be answered with panel data. Our framework
integrates concepts from econometrics (Baltagi 2008), biostatistics
(Crowder and Hand 1990; Fitzmaurice et al. 2011), and quantita-
tive psychology (Ram et al. 2005; Grimm et al. 2016) into a uni-
ed framework. We use this as a springboard to illustrate how
SCM scholars have examined the services supply chain setting of
health care with public data regarding hospital-level patient satis-
faction. We show that past studies have begun to address some
types of research questions with panel data, but that many other
types of questions remain. Following this review of services sup-
ply chain health care research, we test a series of three questions
not yet examined in this area with panel data by tting a series of
structured latent curve models (SLCMs) (Browne and Du Toit
1991; Blozis 2004) to seven years of hospital-level patient satisfac-
tion data for nearly 4,000 hospitals in the United States. Keeping
with our theme, the discussion highlights theoretical and method-
ological challenges SCM scholars are likely to encounter as they
use the panel data in their research.
This article contributes to the SCM literature in several
ways. First, it provides an integrated framework explicating the
types of questions that can be asked with panel data by syn-
thesizing different methodological research traditions. Second,
it extends prior services supply chain investigations in the
health care sector by studying new questions concerning hospi-
tal-level improvement in patient satisfaction. This responds to
Ketokivi and McIntoshs (2017) call for SCM scholars to
focus more emphasis on studying how subjectsperformance
changes over time. Third, our discussion section claries the
role that the passage of time plays when devising theories
with longitudinal character, a topic that has received limited
attention (Ancona et al. 2001). Fourth, this research highlights
some of the challenges SCM scholars can expect to face when
using panel data.
This article is structured in three sections. The next section
develops a framework for different types of questions address-
able using panel data. The penultimate section discusses SCM
literature in the health care sector, specically those that have
examined hospital-level patient satisfaction, and extends this
literature by tting SLCMs to panel data on hospital-level
patient satisfaction to answer unaddressed questions. The nal
section describes theoretical and methodological challenges
with using panel data, notes limitations, and suggests directions
for future research.
LEVERAGING PANEL DATA TO BUILD SCM THEORY
We begin by establishing the boundaries for our examination.
First, we assume theoretical constructs
5
serving as outcomes are
continuous, as opposed to categorical. We make this assumption
because categorical constructs must be operationalized and ana-
lyzed using different techniques (Collins and Lanza 2010). Sec-
ond, we assume that the empirical operationalizations
6
of
theoretical constructs within the same study have an acceptable
degree of measurement invariance so that comparing these
4
These divergences concern both research questions studied
and metaphysical issues more than they concern the use of dif-
ferent terminology. An example of the former is that biostatisti-
cians and quantitative psychologists focus much attention on
studying change over time using growth curve models (Fitzmau-
rice et al. 2011; Little 2013), whereas economists give little
attention to these modelsWooldridges (2010) text on panel
data does not discuss such models. A metaphysical difference is
that many quantitative psychologists limit their discussion of
parameter bias because they do not believe in truemodels
(Cudeck and Henly 1991, 2003; MacCallum 2003). Economists
and sociologists tend to strongly emphasize issues of parameter
bias resulting from omitted variables and consequently place
greater emphasis on the use of instrumental variables and subject
xed effects when tting panel models (Halaby 2004; Wool-
dridge 2010).
5
This does not mean that observed measures in these databases
need to be continuous, as one can operationalize a continuous
construct using multiple binary observed indicators (Newsom
2015).
6
Constructs can be operationalized using either single mea-
sures (i.e., as observed scores) or as multi-item latent variables.
In the case of single measures, testing for measurement invari-
ance is impossible (Little 2013).
Big Data and Panel Data 183

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT