Thirsty in an Ocean of Data? Pitfalls and Practical Strategies When Partnering With Industry on Big Data Supply Chain Research
Author | Rod Franklin,A. Michael Knemeyer,Kevin B. Smyth,Keely L. Croxton |
DOI | http://doi.org/10.1111/jbl.12187 |
Published date | 01 September 2018 |
Date | 01 September 2018 |
Thirsty in an Ocean of Data? Pitfalls and Practical Strategies
When Partnering With Industry on Big Data Supply Chain
Research
Kevin B. Smyth
1
, Keely L. Croxton
1
, Rod Franklin
2
, and A. Michael Knemeyer
1
1
The Ohio State University
2
K€
uhne Logistics University
Increased volume, velocity, and variety of data provides new opportunities for businesses to take advantage of data science techniques, pre-
dictive analytics, and big data. However, firms are struggling to make use of their disjointed and unintegrated data streams. Despite this, aca-
demics with the analytic tools and training to pursue such research often face difficulty gaining access to corporate data. We explore the
divergent goals of practitioners and academics and how the gap that exists between the communities can be overcome to derive mutual value
from big data. We describe a practical roadmap for collaboration between academics and practitioners pursuing big data research. Then we
detail a case example of how, by following this roadmap, researchers can provide insight to a firm on a specific supply chain problem while
developing a replicable template for effective analysis of big data. In our case study, we demonstrate the value of effectively pairing manage-
ment theory with big data exploration, describe unique challenges involved in big data research, and develop a novel and replicable hierarchical
regression-based process for analyzing big data.
Keywords: big data; data science; practitioner engagement; governance form
INTRODUCTION
Many of today’s supply chain managers feel like the ancient
mariner in the famous poem by Samuel Taylor Coleridge. But
rather than being surrounded by undrinkable water, supply chain
managers are more likely to be heard saying “Data, data, every-
where, nor any information to use.”An estimated 1,000 exabytes
(10
21
bytes) of data are generated each year (Yin and Kaynak
2015), and U.S. firms store on average more than 200 terabytes,
with a 40% expected annual compounded growth in that figure
(McKinsey Global Institute 2011). Often referred to as “big data”
(McAfee and Brynjolfsson 2012), this rapid expansion of data
creation and storage represents a potential source of competitive
advantage (Waller and Fawcett 2013b) and has businesses scram-
bling to extract value from it (Sanders 2016). Although many
definitions of big data are in common use, we specify big data
as an increase in volume, variety, and velocity of information
such that traditional analytic methods are stretched beyond their
limits (Megahed and Jones-Farmer 2015). This definition cap-
tures the multidimensionality of the concept, as “bigness”of data
is relative to the constantly expanding capacity to store and pro-
cess information (Guarnieri 2016), as well as to the intended util-
ity (McKinsey Global Institute 2011).
Despite its massive potential, firms struggle to derive meaningful
value from big data. Wieland et al. (2016, 207) note that compa-
nies are finding themselves in a situation of “big data, but small
math,”and exploitation of this growing data resource requires more
than accumulation and storage. A complementary issueis that busi-
nesses are careful of the level of access granted to sensitive internal
data, which drive trepidation in enlisting the help of outside experts
who may be a valuable resource for translating this data into
actionable information. This is especially true if the value of col-
laboration is unclear to decision makers within the firm. Despite
growing interest and multiple calls by editors to pursue practically
and theoretically relevant big data research (Waller and Fawcett
2013a,b), academics with the analytic tools and training to pursue
such research often face difficulty gaining access to corporate data.
This manifestation of the gap between research and practi-
tioner communities (Hutt and Walker 2015), stems from logical,
temporal, and incentive-based differences in how the two parties
approach problems in their work. Management practitioners are
concerned with urgent practical problems, expect results quickly,
and are more often rewarded based on short-term measurable
goals (Hutt 2008), whereas academics have tended to prefer
research on more comprehensive (although oftentime less urgent)
underlying causes for observed phenomena. This methodical
approach inevitably delays results, as research entails an exami-
nation not only of the practical phenomena but also of the exist-
ing body of knowledge pertinent to the subject and all possible
contributing factors. Researchers are often rewarded more for this
epistemological “truth”than for the managerial relevance of their
work (Bartunek and Rynes 2014).
Successfully bridging the research–practice gap requires practi-
tioners and academics to find common ground. Van de Ven
(2007), Avenier and Cajaiba (2012), and Hutt and Walker (2015)
all propose an iterative two-way dialog between the parties to
forge mutually beneficial research questions. This dialog often
initiated by and facilitated through an ongoing corporate research
partnership program such as those that exist at many research-
oriented business schools (Hutt 2008), aims to identify recurring
challenges and opportunities of practical relevance that may also
contribute to an increased understanding of more generalizable
management phenomena. Researchers in applied fields seeking to
gain access to “big”data sets to advance epistemological under-
standing of business phenomena must engage in this dialog to
Corresponding author:
Kevin B. Smyth, Department of Marketing and Logistics, Fisher
College of Business, The Ohio State University, 2100 Neil Ave,
Columbus, OH 43210, USA; E-mail: smyth.43@osu.edu
Journal of Business Logistics, 2018, 39(3): 203–219 doi: 10.1111/jbl.12187
© 2018 Council of Supply Chain Management Professionals
identify and balance the obligations they hold to both practice
and the academy (Cotteleer and Wan 2016).
With this perspective in mind, the goals of this study are as
follows: (1) to describe a practical roadmap for collaboration
between researchers and practitioners pursuing big data research
and (2) to detail a case example of how, by following this road-
map, researchers can provide insight into a firm on a specific
supply chain problem while developing a replicable template for
effective analysis of big data. The case example highlights our
experience working on a big data study to find the sources of
replenishment forecast deviation and bias in the quick service
restaurant industry. Key contributions of the current article are to
propose a process of conducting research that is of mutual value
to practitioners and researchers, demonstrate the value of pairing
a priori theorizing with big data exploration as proposed in Wal-
ler and Fawcett (2013a), describe unique challenges involved in
big data research, and develop a novel and replicable hierarchical
regression-based process for analyzing big data.
The remainder of the study is organized as follows: We describe
the origin and context of our big data case study, elucidate unique
technical challenges in collection, manipulation, and traditional
methods of exploration of big data, propose an alternative novel
hierarchical regression-based approach to explore big data, and
finally present a process for future practitioner-academic research
collaborations derived from our case study experience.
RELATIONSHIP DEVELOPMENT
Before describing the proposed facilitating process for big data
research engagement of academics and business practitioners, we
outline our specific experience. The process is organized into
subprocesses of relationship identification, research motivation,
project management, and findings validation. It should be noted
that the description of the findings validation subprocess will
occur after we describe the proposed facilitating process.
Relationship identification
The origins of this study stem from an ongoing relationship with
members of a practitioner–academic research group at a large
Midwestern university. Through the group, the primary fourth-
party logistics (4PL) provider for a major international quick ser-
vice restaurant enlisted our assistance in helping address multiple
supply chain issues their major restaurant customer was experi-
encing. The fact that the 4PL service provider felt comfortable in
approaching the team for assistance was based on their experi-
ence with academic researchers through their participation in the
practitioner–academic research group and other academic rela-
tionships that their organization had developed. The level of
experience a firm has working with academics critically impacts
how researchers generate initial interest and foster a strong ongo-
ing relationship for collaborative research.
Research motivation
Following the guidance of Hutt and Walker (2015), our research
team engaged in a bilateral collaborative dialog to translate the
practitioner’s challenges into mutually beneficial research
questions. One of the issues identified was that the major restau-
rant firm was experiencing significant order deviation by individ-
ual restaurant outlets in their centrally developed replenishment
forecasts, causing costly inefficiencies and exaggerated responses
in multiple levels of the firm’s supply chain.
After this initial problem identification, we visited the head-
quarters of the 4PL provider to understand the work the firm
conducted for the restaurant chain. Through interviews with man-
agers and analysts across multiple divisions in the 4PL, as well
as personnel from one of the five third-party logistics (3PL) com-
panies that provide distribution services for the restaurant chain,
we began to understand the connections between various inter-
ests that affect the focal issue of replenishment forecast
deviation.
The restaurant firm in our research operates almost 15,000
retail outlets domestically, of which more than 80% are con-
tracted to franchise companies. The firm utilizes the aforemen-
tioned 4PL to centrally develop sales forecasts and plan
replenishment for all restaurant outlets. All involved firms—the
restaurant firm, its 4PL provider, its 3PL providers, and the fran-
chise companies—utilize a management information system
(MIS) curated by the 4PL, thus providing a centralized source
for big data across the involved companies. Each entity’s rele-
vant MIS data are visible to at least the adjacent link in the sup-
ply chain. Figure 1 illustrates the relevant data flows within and
between firms.
Driven by data or by theory?
These data that permit exploration of causes of replenishment
forecast deviation clearly represent big data, because for almost
15,000 distributed outlets and 8,100 stock units, there are more
than 120 million potential daily transactions to evaluate. With
daily frequency, this data also had a velocity consistent with
incipient definitions of big data (McAfee and Brynjolfsson 2012;
Kitchin and McArdle 2016). While the variety of information
forms may not span to unstructured data, drawing from multiple
databases for a useful sample and mixing numeric and non-
numeric data are common features of big data analysis (Megahed
and Jones-Farmer 2015).
The generation of useful knowledge from this ocean of data is
increasingly achieved through a synthesis of data science,
1
pre-
dictive analytics,
2
and big data, referred to as DPB (Waller and
Fawcett 2013b). While use of DPB has expanded rapidly in
industry and in practitioner literature, it is still a matter of intense
debate for academics. Several generic approaches exist for big
data mining or exploratory pattern recognition that can identify
correlative relationships and aid immensely in prediction (Hand
et al. 2001; Han et al. 2011; Kuhn and Johnson 2013). However,
they lack the means to explain the causal mechanisms of their
predictions and appear to challenge the paradigm of a priori
theorizing.
1
Data science is defined here as the study of the generalizable
extraction of knowledge from data (Dhar 2013).
2
We define predictive analytics as a broad class of statistical or
analytic techniques used to develop predictions of otherwise
unknown future events or behavior (Nyce 2007).
204 K. B. Smyth et al.
To continue reading
Request your trial