Inference with Dependent Data in Accounting and Finance Applications

DOIhttp://doi.org/10.1111/1475-679X.12219
AuthorCHRISTIAN HANSEN,TIMOTHY CONLEY,SILVIA GONÇALVES
Published date01 September 2018
Date01 September 2018
DOI: 10.1111/1475-679X.12219
Journal of Accounting Research
Vol. 56 No. 4 September 2018
Printed in U.S.A.
Inference with Dependent Data in
Accounting and Finance
Applications
TIMOTHY CONLEY,
SILVIA GONC¸ALVES,
AND CHRISTIAN HANSEN
Received 8 November 2017; accepted 17 April 2018
ABSTRACT
We review developments in conducting inference for model parameters in
the presence of intertemporal and cross-sectional dependence with an em-
phasis on panel data applications. We review the use of heteroskedasticity
and autocorrelation consistent (HAC) standard error estimators, which in-
clude the standard clustered and multiway clustered estimators, and discuss
alternative sample-splitting inference procedures, such as the Fama–Macbeth
procedure, within this context. We outline pros and cons of the different
procedures. We then illustrate the properties of the discussed procedures
within a simulation experiment designed to mimic the type of firm-level panel
data that might be encountered in accounting and finance applications. Our
conclusion, based on theoretical properties and simulation performance, is
that sample-splitting procedures with suitably chosen splits are the most likely
to deliver robust inferential statements with approximately correct coverage
University of Western Ontario; McGill; University of Chicago Booth School of Business.
Accepted by Christian Leuz. We would like to thank Rodrigo Verdi for providing us with
the data we used to construct our simulation experiment and JAR editors for helpful com-
ments. We thank Aldo Sandoval Hernandez for providing excellent research assistance. This
material is based on work supported by the National Science Foundation under Grant No.
1558636, the University of Chicago Booth School of Business, and the Social Sciences and Hu-
manities Research Council of Canada. An online appendix to this paper can be downloaded at
http://research.chicagobooth.edu/arc/journal-of-accounting-research/online-supplements.
1139
CUniversity of Chicago on behalf of the Accounting Research Center,2018
1140 T.CONLEY,S.GONC¸ALVES,AND C.HANSEN
properties in the types of large, heterogeneous panels many researchers are
likely to face.
JEL codes: C12; C23
Keywords: hypothesis testing; confidence intervals; robust standard error
estimation; spatial dependence; bootstrap; fixed-effects
1. Introduction
Empirical research in accounting and finance often uses panel data on
firms, sectors, or regions over time. It is routine for such data to be interre-
lated. In particular, unobservable factors (“shocks”) are typically important
in determining outcomes and seem likely to be related across observations.
Time series shocks affecting an individual firm or geographic region are
often taken to be serially correlated. Similarly, shocks at a single point in
time affecting different observations may be correlated with each other.
As examples, supply shocks may jointly impact all firms in industries with
similar technology, and shocks to interest rate expectations may jointly im-
pact firms with similar exposure to interest rate risk. Moreover, such shocks
are likely to be correlated across firms at different time periods. For exam-
ple, firms with similar investment opportunities will tend to make similar
choices and experience correlated shocks. Furthermore, with multiperiod
investments, such firms will routinely exhibit correlation across nearby time
periods, not just contemporaneously.1
It is well known that researchers need to account for the presence of
dependent unobservables when conducting statistical inference for model
parameters. For example, a 5% level test formed from a t-statistic with stan-
dard error that is estimated assuming independence across observations
can have size—the probability of rejecting a true null hypothesis—very far
from 5% when the data are in fact dependent. This potential for distortions
to inferential statements from failing to properly account for dependence
has long been recognized in the time series literature. In the empirical eco-
nomics literature, Bertrand, Duflo, and Mullainathan [2004] highlighted
this point in the context of panel data with cross-sectional independence
and intertemporal correlation; and, casual empiricism based on papers
appearing after Bertrand, Duflo, and Mullainathan [2004] suggests that
applied researchers dealing with panel data are now acutely aware of the
potential for distortions from failing to account for dependence when con-
ducting inference. Indeed, the vast majority of applied work with panel data
in accounting, finance, and economics uses inference procedures that are
robust to some form of correlation across observations.
1We highlight that this intuitive structure leads to correlation across different firms in dif-
ferent time periods and would render the common practice of using two-way clustering by
firm and time inappropriate, as this two-way clustering structure imposes that different firms
in different time periods are uncorrelated.
INFERENCE WITH DEPENDENT DATA IN ACCOUNTING 1141
The importance of adequately accounting for dependence, both inter-
temporal and cross sectional, has led to the development of a variety
of statistical procedures that aim to deliver valid inferential statements
about parameters of interest when data may be dependent and hetero-
geneous. These methods include the use of clustered standard error esti-
mators, sample-splitting procedures such as the Fama–Macbeth procedure,
and bootstrap procedures. While the menu of available methods offers re-
searchers many high-quality options, the methods are not equivalent and
involve substantive choices. For example, when using clustered standard
errors, the obvious decision that must be made is at what level to form clus-
ters. There are also less obvious choices such as what critical values to use
and what fixed effects structure to maintain that have important impacts
on the quality of inference.
The goal of this review is to offer a heuristic overview of leading inferen-
tial approaches with dependent data and a practical guide to some of the
tradeoffs between methods and choices that must be made. In addition to
reviewing different broad classes of methods, we talk about practical issues
that are common to all approaches and often ignored in the literature. Two
particularly important issues are the choice of group structure, for exam-
ple, on what level(s) to cluster for one- or multiway clustering, and how the
choice of the group structure interacts with fixed effects structures.
The practical recommendations we make are based on a simulation study
that was designed to allow an evaluation of alternative procedures in a typ-
ical accounting or finance application. Importantly, our simulation model
is not based on a stylized model with simple dependence structure. Rather,
we base the simulation on the empirical analysis in Balakrishnan, Core, and
Verdi [2014] and use a data-generating process (DGP) that is heteroge-
neous, allows for dependence along multiple dimensions, and is designed
to approximate the correlation structure that is present in the data. Our
simulation results should therefore be empirically relevant for accounting
and corporate finance applications.
1.1 MAIN RESULTS
We evaluate alternative inference procedures in the context of a sim-
ple panel-data regression estimated by ordinary least squares (OLS) un-
der strict exogeneity of regressors so there are no finite sample bias con-
cerns. Tosummarize the main points from our simulation, we conclude that
sample-splitting strategies (e.g., Fama–MacBeth) with a small number (e.g.,
5–10) of groups that each consist of many observations are likely to yield the
most reliable inferential statements in many accounting and finance appli-
cations. The use of a small number of large groups allows one to accommo-
date very rich dependence structures and the use of sample-splitting allows
one to accommodate very heterogeneous data. Having many observations
per group is also important for sample-splitting estimators as they require
estimation of model parameters within each group, and these group-level
estimates may be very unstable if the groups have few observations.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT