Is there a missing factor? A canonical correlation approach to factor models

AuthorM. Fabricio Perez,Seung C. Ahn,Stephan Dieckmann
Date01 October 2018
Published date01 October 2018
DOIhttp://doi.org/10.1016/j.rfe.2017.11.002
ORIGINAL ARTICLE
Is there a missing factor? A canonical correlation approach to
factor models
Seung C. Ahn
1,2
|
Stephan Dieckmann
3
|
M. Fabricio Perez
4
1
Arizona State University, Tempe,
Arizona
2
Sogang University, Seoul, Korea
3
Wharton Finance Department at the
University of Pennsylvania, Philadelphia,
Pennsylvania
4
Wilfrid Laurier University, Waterloo,
Canada
Correspondence
M. Fabricio Perez, Wilfrid Laurier
University, Lazaridis School of Business
and Economics, Waterloo, ON, Canada.
Email: mperez@wlu.ca
Abstract
A common question in asset pricing research is if a finite set of observable vari-
ables can completely capture the systematic or common variations in a large num-
ber of response variables. This paper provides a new approach to answer this
question. A novelty is that common factors are extracted using canonical relations
between response variables and observable factors. We show how these factors in
combination with tests for the number of factors can be used to evaluate if a
given set of macroeconomic and financial variables is sufficient to capture all the
systematic variation in the response variables. We illustrate the usefulness of our
methods by analyzing the systematic determinants of credit spreads of U.S. corpo-
rate bonds.
JEL CLASSIFICATION
C33, G12
KEYWORDS
canonical correlations, common factors, credit risk, factor analysis
1
|
INTRODUCTION
A common question in asset pricing research is if a finite set of observable variables can completely capture the systematic
or common variation of a large number of response variables. The observable variables used by empirical asset pricing
studies are often chosen as proxy variables for the true latent factors that are suggested by economic or finance theory. We
refer to such variables as instrumental variables.For example, in an influential article, Collin-Dufresne, Goldstein, and
Martin (2001) investigate the determinants of credit spread changes as motivated by the formulation of structural models of
default risk. They consider a large set of instrumental variables, consisting of macroeconomics and financial variables that
should, in theory, completely explain co-movements in credit spreads. They find that a common systematic compo nent is
not captured by these observable variables.
1
The conventional approach (as used by Collin-Dufresne et al., 2001) to test for the presence of unexplained systematic
components is to model the regression residuals as a linear factor model. Intuitively, any systematic variation not captured
by the instrumental variables should remain in the regression residuals. Thus, the existence of common factors in these
residuals will be an indication of a missing factor.
In this paper, we first show that this conventional approach may lead to incorrect conclusions. Regression residuals may
have two possible sources of systematic (common) variation. The first, and commonly assumed one, is that the syst ematic
variation is caused by a missing factor in the model. The second source of common variation, ignored by the previo us liter-
ature, is the result of the instrumental variables not being perfect proxies for the true factors. If this is the case, then the
regression residuals capture the systematicmeasurement errors in the observable variables. Thus, residuals may have a
factor structure as the result of the systematic measurement errors and not necessarily as a consequence of missin g factors.
Received: 15 February 2017
|
Revised: 6 September 2017
|
Accepted: 30 November 2017
DOI: 10.1016/j.rfe.2017.11.002
Rev Financ Econ. 2018;36:321347. wileyonlinelibrary.com/journal/rfe ©2018 The University of New Orleans
|
321
Our second contribution is to propose an estimation method that allows for testing of whether the factor structure in the
residuals is in fact caused by a missing factor and not by systematic measurement errors. Our test is based on the construc-
tion of factors using the information from the response variables and the instrumental variables, by canonical correlati on
analysis (CCA). We refer to these factors as CCA factors. We show that the residuals obtained from regressions of the
response variables on the CCA factors do not contain systematic measurement errors.
Third, we show that estimation methods for the number of latent factors in multiple response variables can be used to
evaluate if there are missing factors not captured by the instrumental variables. We show that the estimation methods of
Ahn and Perez (2010) can be used to estimate two numbers: the number of all relevant factors explaining common varia-
tion in response variables, and the number of the factors that are correlated with the observed explanatory variables. If the
two numbers are equal, then we can conclude that the set of explanatory variables is sufficient to capture all latent factors.
2
Our CCA approach can be viewed as the opposite of the previous studies that have tried to find a refined set of inst ru-
mental variables that can better explain the co-movement in response variables. Instead of testing additional structural vari-
ables, we hypothesize that a standard set of instrumental variables is sufficient to construct all common factors. Those
variables might not be perfect proxies for the factors, but they should contain sufficient information to be able to construct
them. If this is the case, regression analysis using the CCA factors instead of the observed explanatory variables can be a
better alternative. We show that CCA factors are free from systematic measurement errors and have greater explan atory
power for the response variables. In the case that the set of observed explanatory variables is not able to capture all latent
factors, new theoretical and empirical models should be developed to understand the nature of the missin g factor.
We are not the first to investigate the relationship between latent factors and observed instrumental variables. Gouriroux
et al. (1995) propose to estimate factors by CCA. Our approach is different from theirs in that we estimat e factors by linear
functions of response variables while they propose functions of the instrumental variables. The use of thei r factors is not
immune to the above-mentioned systematic measurement error problem. Our CCA factors are not consistent estimators of
true factors. Nonetheless, they preserve the factor loadings of true factors in that the factor loadings are the same as the
loadings of true factors up to a linear non-singular transformation. In addition, the residuals from the regressions of
response variables on the CCA factors contain the idiosyncratic components of response variables only, unless some factors
exist that are not correlated with the instrumental variables.
Bai and Ng (2006) have developed a general approach to evaluate how observable instrumental variables and unob-
served true factors are correlated. Their methods enable researchers to estimate canonical correlations between observed
variables and true factors, and even test if some observable variables are true factors. Their methods, however, require uti-
lizing data with a large number of response variables as well as a large time series dimension. For such data, principal com-
ponents of the response variables are consistent estimators of the true factors. However, principal components are not
consistent for data displaying a small number of response variables. Our approach is designed for data with a small number
of response variables, and is therefore suitable for several finance applications.
We illustrate the usefulness of our methods by revisiting Collin-Dufresne et al.s (2001) study on the determinants of
credit spreads. We show that the common variation in credit spreads is mainly explained by three factors. Two of them are
strong factors, in terms of their explanatory power for individual credit spreads. The third is a weak factor, since it only
accounts for a small fraction of the total common variation relative to the amount of idiosyncratic noise. The average
explanatory power of the three factors is 49% of total variation, of which the two strong factors contribute 26% and 14%,
respectively. Furthermore, we find that irrelevant factors explain only the idiosyncratic noise of single test asset s, while rel-
evant ones explain all common variation. The three factors can be identified using the same set of varia bles proposed by
Collin-Dufresne et al. (2001) and some additional instrumental variables. This is interesting, as it suggests that no strong
factor exists in the corporate bond market that is not also present in the equity market, the swap market, or the market for
U.S. Treasury debt.
Our paper proceeds as follows: We first outline our methodology in Section 2. Section 3 presents the analysis of credit
spreads of U.S. corporate bonds. We conclude in Section 4, and the Appendix contains Tables and Figures.
2
|
EMPIRICAL METHODS
2.1
|
Motivation
Consider the linear regression model:
yt¼aþNztþet(1)
322
|
AHN ET AL.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT