Taming the Factor Zoo: A Test of New Factors

AuthorGUANHAO FENG,DACHENG XIU,STEFANO GIGLIO
Date01 June 2020
DOIhttp://doi.org/10.1111/jofi.12883
Published date01 June 2020
THE JOURNAL OF FINANCE VOL. LXXV, NO. 3 JUNE 2020
Taming the Factor Zoo: A Test of New Factors
GUANHAO FENG, STEFANO GIGLIO, and DACHENG XIU
ABSTRACT
We propose a model selection method to systematically evaluate the contribution to
asset pricing of any new factor, above and beyond what a high-dimensional set of
existing factors explains. Our methodology accounts for model selection mistakes that
produce a bias due to omitted variables, unlike standard approaches that assume per-
fect variable selection. We apply our procedure to a set of factors recently discovered
in the literature. While most of these new factors are shown to be redundant relative
to the existing factors, a few have statistically significant explanatory power beyond
the hundreds of factors proposed in the past.
THE SEARCH FOR FACTORS THAT explain the cross section of expected stock re-
turns has produced hundreds of potential candidates, as noted by Cochrane
(2011) and more recently by Harvey, Liu, and Zhu (2015), McLean and Pontiff
(2016), and Hou, Xue, and Zhang (2017). A fundamental task facing the asset
pricing field today is to bring more discipline to the proliferation of factors. In
particular, a question that remains open is: how to judge whether a new factor
adds explanatory power for asset pricing, relative to the hundreds of factors
the literature has so far produced?
This paper provides a framework for systematically evaluating the contribu-
tion of individual factors relative to existing factors as well as for conducting
appropriate statistical inference in this high-dimensional setting. More specif-
ically, we provide a methodology for estimating and testing the marginal im-
portance of any factor gtin pricing the cross section of expected returns beyond
what can be explained by a high-dimensional set of potential factors ht, where
Guanhao Feng is from City University of Hong Kong College of Business. Stefano Giglio is
from Yale School of Management, NBER, and CEPR. Dacheng Xiu is from University of Chicago
Booth School of Business. We appreciate insightful comments from Alex Belloni, John Campbell,
John Cochrane, Chris Hansen, Lars Hansen, Bryan Kelly, Stefan Nagel, and Chen Xue. We are
also grateful for helpful comments from seminar and conference participants at the City Uni-
versity of Hong Kong, Peking University, Renmin University, University of British Columbia,
Luxembourg School of Finance, AQR, Morgan Stanley, Two Sigma, 2018 Annual Meetings of the
American Finance Association, the 2016 Financial Engineering and Risk Management Sympo-
sium in Guangzhou, 2017 EcoStat Conference at Hong Kong University of Science and Technology,
and University of Oregon Summer Finance Conference. We acknowledge research support by the
Fama-Miller Center for Research in Finance at Chicago Booth. The authors have read The Journal
of Finance disclosure policy and have no conflicts of interest to disclose.
Correspondence: Stefano Giglio, Department of Finance, Yale School of Management, NBER,
and CEPR, 165 Whitney Avenue, New Haven, CT 06520; e-mail: stefano.giglio@yale.edu.
DOI: 10.1111/jofi.12883
C2020 the American Finance Association
1327
1328 The Journal of Finance R
gtand htcan be tradable or nontradable factors. We assume that the true as-
set pricing model is approximately low-dimensional. However, in addition to
relevant asset pricing factors, gtand htinclude both redundant factors that
add no explanatory power to the model, as well as useless ones that have no
explanatory power at all. We select the relevant factors from htand conduct
proper inference on the contribution of gtabove and beyond those factors. Our
methodology can be thought of as a conservative test for new factors, which
benchmarks them against a large-dimensional set of existing factors.
When htconsists of a small number of factors, testing whether gtis useful in
explaining asset prices while controlling for the factors in htis straightforward:
it simply requires estimating the loadings of the stochastic discount factor
(SDF) on gtand ht, and testing whether the loading of gtis different from zero
(see Cochrane (2009)). This exercise tells us not only whether gtis useful for
pricing the cross section, but also how shocks to gtaffect marginal utility,which
has a direct economic interpretation.
When htconsistsof potentially hundreds offactors, standard statistical meth-
ods to estimate and test the SDF loadings become infeasible or result in poor es-
timates and invalid inference because of the curse of dimensionality. Although
variable selection techniques (e.g., the least absolute shrinkage and selection
operator [LASSO]) can be useful in selecting the correct variables under cer-
tain conditions and thereby reduce the dimensionality of ht, relying on this
result produces very poor approximations to the finite-sample distributions of
the estimators unless appropriate econometric methods are used to explicitly
account for model selection mistakes (see Chernozhukov et al. (2015)). This
means that, for example, simply applying a model selection tool like LASSO to
a large set of factors and checking whether a particular factor gtis significant
(or even just checking if it gets selected) is not a reliable way to determine
whether gtis one of the true factors.
The methodology we propose in this paper marries these new economet-
ric methods (in particular, the double-selection LASSO method of Belloni,
Chernozhukov, and Hansen (2014b)) with two-pass regressions such as Fama-
MacBeth to evaluate the contribution of a factor to explaining asset prices in
a high-dimensional setting. Without relying on prior knowledge about which
factors to include as controls among a large number of factors in ht, our proce-
dure selects the factors that are useful either in explaining the cross section of
expected returns or in mitigating the omitted variable bias problem due to po-
tential model selection mistakes. We show that including both types of factors
as controls is essential to conduct reliable inference on the SDF loading of gt.
We apply our methodology to a large set of factors proposed in the last
30 years. In particular, we collect and construct a large factor data library
containing 150 risk factors. This factor zoo contains many potentially redun-
dant factors, and thus is an ideal data set to conduct our empirical analysis.
As an example, consider the seasonality factor of Heston and Sadka (2008).
This factor has a statistically significant alpha with respect to the Fama-
French three-factor model (t-statistic =2.06) in our sample. Thus, if evaluated
against this benchmark model, one would conclude that seasonality is a useful
Taming the Factor Zoo 1329
factor. But seasonality turns out to be highly correlated with momentum (for
instance, it has a correlation of 0.63 with the Carhart momentum factor).
Moreover, if one evaluates it against a model that includes momentum (like
the Fama-French four-factor model), the alpha becomes small and statistically
insignificant (t-statistic =−0.87). This example highlights the importance of
the benchmark in evaluating new factors. Most papers in the literature that
aim to produce new factors choose the benchmark model somewhat arbitrarily,
subject to potential data-mining bias. Our procedure systematically constructs
the best low-dimensional benchmark to evaluate new factors using the entire
factor zoo.
We perform several empirical exercises that illustrate the use of our pro-
cedure in the data. First, we start by evaluating the marginal contribution of
factors proposed over the last five years (2012 to 2016) to the large set of factors
proposed before then. The new factors include, among others, the two new fac-
tors introduced by Fama and French (2015) and Hou, Xue, and Zhang (2015),
and the intermediary-based factors of He, Kelly, and Manela (2017). Note that
our test is conservative; it requires that a new factor gtcontributes to the cross
section relative to the entire universe of existing factors ht. Given the large
dimensionality of the factors produced in the literature, one might wonder
whether, in practice, any additional factor could ever make a significant contri-
bution. We show that several of the newly proposed factors (e.g., profitability)
indeed have significant marginal explanatory power for expected returns.
Second, we conduct a recursive exercise in which factors are tested as they
are introduced against previously proposed factors. This exercise shows that
our procedure would deem most factors as redundant or spurious, finding sig-
nificance for a small number of factors. Over time, our procedure would screen
out many factors at the time of their introduction, thus helping address the
proliferation of factors. Going forward, our test can be used to make inference
about new factors that will be introduced in the future.
Third, we explore an alternative application of our procedure in which some
factors are determined ex ante to be part of the benchmark ht, and the re-
maining factors are individually tested and added recursively (similar in spirit
to forward stepwise selection), expanding the set of “preselected” factors in
the benchmark at each iteration until no remaining factors contribute to the
expected return variation.
Finally, we study the robustness of our procedure from different angles. We
show that our results are robust to using alternative methods to reduce the
dimensionality of ht, such as Elastic Net and principal component analysis
(PCA), as well as using the stepwise procedure to select the benchmark. We
also show that the results are robust to alternative portfolio constructions.
Most importantly,we explore robustness with respect to the tuning parameters.
Like all machine learning methods, our procedure involves the choice of tuning
parameters (in particular, two tuning parameters, one for each selection step).
In our main analysis, we choose them by cross-validation (CV). We show that
our empirical findings are robust to varying the tuning parameters in the
neighborhood of the values chosen by the CV procedure.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT