Sparse Signals in the Cross‐Section of Returns

AuthorALEX CHINCO,ADAM D. CLARK‐JOSEPH,MAO YE
Date01 February 2019
Published date01 February 2019
DOIhttp://doi.org/10.1111/jofi.12733
THE JOURNAL OF FINANCE VOL. LXXIV, NO. 1 FEBRUARY 2019
Sparse Signals in the Cross-Section of Returns
ALEX CHINCO, ADAM D. CLARK-JOSEPH, and MAO YE
ABSTRACT
This paper applies the Least Absolute Shrinkage and Selection Operator (LASSO)
to make rolling one-minute-ahead return forecasts using the entire cross-section of
lagged returns as candidate predictors. The LASSO increases both out-of-sample fit
and forecast-implied Sharpe ratios. This out-of-sample success comes from identifying
predictors that are unexpected, short-lived, and sparse. Although the LASSO uses a
statistical rule rather than economic intuition to identify predictors, the predictors
it identifies are nevertheless associated with economically meaningful events: the
LASSO tends to identify as predictors stocks with news about fundamentals.
FINANCIAL ECONOMISTS HAVE BEEN LOOKING for variables that predict future stock
returns for as long as there have been financial economists. For example, Banz
(1981) uses market cap to predict future returns, Jegadeesh and Titman (1993)
use lagged returns, and Cohen and Frazzini (2008) use customer earnings
surprises. Tofind these sorts of variables, researchers have to solve two distinct
problems: identification and estimation. First, they have to identify a subset of
candidate predictors; then, they have to estimate the quality of these predictors.
In the past, researchers typically reached for a different set of tools when
working on each of these problems, using their intuition to identify predic-
tors and statistics to estimate quality. This two-pronged approach works well
when you are only looking for steady long-lived predictors. For example, Cabot
Oil & Gas’s lagged return predicted its future return at the one-minute horizon
throughout October 2010. Because it formed such a steady long-lived predictive
Alex Chinco, Adam D. Clark-Joseph, and Mao Yeare all at Gies College of Business, University
of Illinois at Urbana–Champaign. Mao Ye is also with the NBER. We have received many helpful
comments and suggestions from John Campbell; Victor DeMiguel; Xavier Gabaix; Andrew Karolyi;
Bryan Kelly; Maureen O’Hara; Vassilis Papavassiliou; Ioanid Rosu; Thomas Ruchti; Gideon Saar;
Allan Timmermann; Heather Tookes; Sunil Wahal; and Brian Weller; as well as from seminar
participants at the University of Illinois Urbana-Champaign, the 11th Annual Central Bank Con-
ference on the Microstructure of Financial Markets, the 2016 AFAAnnual Meetings, and the NBER
EFFE SI. Hao Xu, Ruixuan Zhou, and Rukai Lou provided excellent research assistance. This re-
search is supported by National Science Foundation grant #1352936, which is joint with the Office
of Financial Research at the U.S. Department of the Treasury. This work also uses the Extreme
Science and Engineering Discovery Environment (XSEDE), which is supported by National Science
Foundation grant #OCI-1053575. We thank David O’Neal of the Pittsburgh Supercomputer Center
for his assistance with supercomputing, which was made possible through the XSEDE Extended
Collaborative Support Service (ECSS) program. We have read the Journal of Finance’s disclosure
policy and have no conflicts of interest to disclose.
DOI: 10.1111/jofi.12733
449
450 The Journal of Finance R
relationship, a researcher could intuit this variable and then estimate its qual-
ity with an ordinary least squares (OLS) regression
rt=ˆα+ˆ
β·xt(+1) +εt∈{0,..., L1},(1)
where rtis Cabot’s minute-treturn, ˆαis its mean return, Lis the length of
the estimation window, xt1is Cabot’s lagged return standardized to have zero
mean and unit variance during the estimation window, and ˆ
βis the associated
OLS coefficient.
But, modern financial markets are big, fast, and complex. Predictability now
exists at scales that are not easy for a researcher to intuit. For instance, the
lagged returns of Family Dollar Corp. were a significant predictor for 20% of the
oil-and-gas industry—including Cabot Oil & Gas—during a 50-minute stretch
on October 6, 2010. Can a researcher really fish this particular variable out of
the sea of spurious predictors using only his intuition? Of course not.
And, without a clear idea of which candidate predictors to test, a researcher
cannot use the OLS regression in equation (1) to estimate the amount of cross-
stock predictability. There were 2,191 NYSE-listed stocks in October 2010. So,
using an OLS regression to estimate the relationship between Cabot’s current
return and the lagged returns of every one of these candidate predictors would
require at least 2,191 observations—or, nearly six trading days! A researcher
cannot wait six days to identify a signal that lasts less than an hour.
With this type of problem in mind, we apply the Least Absolute Shrink-
age and Selection Operator (LASSO) rather than intuition to identify unex-
pected short-lived predictors such as the lagged returns of Family Dollar Corp.
We find that using the LASSO increases both out-of-sample fit and forecast-
implied Sharpe ratios, and we show that this out-of-sample success comes from
identifying predictors that are unexpected, short-lived, and sparse. Finally,
we document that these predictors are often the lagged returns of stocks with
news about fundamentals. In other words, although the unexpected short-lived
predictors that the LASSO identifies are not easy to intuit, they are still eco-
nomically meaningful.
The LASSO. We begin our analysis by describing both how the LASSO works
and why we use it. We are motivated by a simple observation about the limit
of human intuition. As researchers, we cannot use our intuition to identify
predictors that are sufficiently unexpected and short-lived—at some point, our
brains just do not work fast enough. So, to incorporate these sorts of signals
into our return forecasts, we need some other way to solve the identification
problem described above. And, bringing an additional assumption to bear on
the data-generating process for returns is one way to do this.
Our approach is to assume that there are only a handful of important predic-
tors at any one point in time—that is, to bet on sparsity. Morally speaking, if
only S2,191 predictors are important for forecasting Cabot’s returns, then
you should need only a few more than Sobservations to identify and esti-
mate this sparse set of predictors. A researcher can leverage this assumption
by using the LASSO. This is a penalized-regression procedure that sets all
Sparse Signals in the Cross-Section of Returns 451
OLS coefficients weaker than its penalty parameter, λ, to be exactly 0. Be-
cause it does not have to worry about estimating these weak coefficients, the
LASSO can estimate the remaining strong coefficients using far fewer obser-
vations. So, if there are only a handful of important predictors at any one
point in time, then we can use the LASSO to incorporate unexpected short-
lived signals, such as the lagged return of Family Dollar Corp., into our re-
turn forecasts in a way that would not be possible using an unpenalized OLS
regression.
Out-of-Sample Performance. After describing how and why we use the
LASSO to bet on sparsity, we next investigate whether this bet pays off. To
do this, we evaluate one-minute-ahead return forecasts for a randomly chosen
subset of 250 NYSE-listed stocks on each trading day from January 2005 to
December 2012. As a benchmark, we start with forecasts made via OLS re-
gressions that include only steady long-lived predictors, such as a stock’s own
lagged returns or the lagged returns on the market. In our main specifications,
we study one-minute-ahead return forecasts created using three lags of various
steady long-lived predictors, but the exact number of lags does not qualitatively
affect our results.
We then apply the LASSO to make these same one-minute-ahead return
forecasts using the lagged returns of all 2,000+NYSE-listed stocks during the
previous three minutes as candidate predictors. We find that using the LASSO
in addition to a standard benchmark model increases out-of-sample fit by at
least ¯
R2
n=1.2 percentage points relative to just using the benchmark model
by itself. A 1.2-percentage-point increase might seem small, but remember
that we are making one-minute-ahead return forecasts. And, at short hori-
zons, small increases in “R2statistics can generate large benefits for investors”
(Campbell and Thompson (2008, p. 1526)). To highlight this point, we convert
the LASSO’s one-minute-ahead return forecasts into a forecast-implied trad-
ing strategy and document that this forecast-implied strategy generates an
annualized Sharpe ratio of 1.8 net of trading costs.
It is important to emphasize two things about these results. The first is
that we are running out-of-sample tests. It should not be surprising that the
LASSO has better in-sample fit than an OLS regression. The LASSO can
choose from over 3 ·N6,000 candidate predictors; in a 30-minute estima-
tion window, an OLS regression is restricted to fewer than 30. But, there is
no guarantee that the LASSO’s better in-sample fit will translate into better
out-of-sample fit. This will only happen if the cross-section of returns actu-
ally contains a sparse collection of S<30 signals. If there are no signals to
be found or if there are more than 30 signals, then using the LASSO will not
help.
The second is that our results do not imply that researchers should use the
LASSO instead of the standard two-pronged approach—that is, instead of their
intuition. As a researcher, if you can use your economic intuition to identify
a steady long-lived predictor, such as a stock’s own lagged returns, then an
OLS regression is the right way to incorporate this predictor into your return
forecast (Abadie and Kasy (2017)). But, the Bible does not say that all sources

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT