Using sparse categorical principal components to estimate asset indices: new methods with an application to rural southeast asia

DOIhttp://doi.org/10.1111/rode.12568
Published date01 May 2019
AuthorGiovanni Maria Merola,Bob Baulch
Date01 May 2019
REGULAR ARTICLE
Using sparse categorical principal components to
estimate asset indices: new methods with an
application to rural southeast asia
Giovanni Maria Merola
1
|
Bob Baulch
2
1
Xian Jioatong Liverpool University,
Suzhou Dushu Lake Science and
Education Innovation District, Suzhou
Industrial Park, 111 Renai Road,
Suzhou, Jangsu, P R China
2
International Food Policy Research
Institute, 2033 K Street NW, Washington
DC, USA
Correspondence
Giovanni Merola, Xian Jioatong
Liverpool University, 111 Renai Road
Suzhou, Suzhou Dushu Lake Science and
Education Innovation District, Suzhou
Industrial Park, Suzhou, P. R. China.
Email: giovanni.merola@xjtlu.edu.cn
Abstract
Asset indices have been used since the late 1990s to mea-
sure wealth in developing countries. We extend the stan-
dard methodology for estimating asset indices using
principal component analysis in two ways: by introducing
constraints that force the indices to have increasing value
as the number of assets owned increases, and by estimat-
ing sparse indices with a few key assets. This is achieved
by combining categorical and sparse principal component
analysis. We also apply this methodology to the estima-
tion of per capita level asset indices. Using household
survey data from northwest Vietnam and northeast Laos,
we show that the resulting asset indices improve the pre-
diction and ranking of income both at household and per
capita level.
1
|
INTRODUCTION
Since the late 1990s, researchers have used asset indices (AIs) as a relatively simple way to
measure householdslong-term wealth or socioeconomic status in developing countries. The
reason for using AIs as a proxy for household income or expenditure stems from the well-
known difficulties associated with collecting comprehensive and reliable data on household
income or expenditures (Deaton, 1997; Grosh and Glewwe, 2000), a desire in surveys focused
on health or other issues to have quickmeasure of household wealth (Gwatkin et al., 2007;
Filmer and Pritchett, 2001), and the reduction of poverty and income dynamics due to mea-
surement error (Carter and Barrett, 2006; Deaton, 1997; McKay and Perge, 2013). A recent
review by Filmer and Scott (2012) analyzed the results of a number of applications of AIs and
concluded that they are useful for the analysis of differences in health, education, fertility, and
child mortality.
DOI: 10.1111/rode.12568
640
|
© 2018 John Wiley & Sons Ltd wileyonlinelibrary.com/journal/rode Rev Dev Econ. 2019;23:640662.
In most applications AIs are estimated by adapting methods designed for summarizing continu-
ous data into the categorical asset ownership and housing characteristic observed in household sur-
veys. The most popular approach is to is to apply principal component analysis (PCA) to dummy
variables representing asset ownership, as originally proposed by Filmer and Pritchett (2001).
Other methods used to compute AIs include factor analysis (Sahn and Stifel, 2000, 2003; Balen et
al., 2010; Smits and Steendijk, 2015), polychoric PCA (Kolenikov and Angeles, 2004; Moser and
Felton, 2007), and multiple correspondence analysis (MCA) (Booysen et al., 2005; Smits and
Steendijk, 2015).
Despite the widespread adoption of AIs, concerns remain about both the statistical validity of
the way AIs are constructed and the interpretability of the results generated. One of the major
drawbacks of AIs computed from dummy variables is that the intrinsic ordering of counts of assets
cannot be retained. Therefore, the coefficients corresponding to owning a large amount of an asset
may be smaller than the coefficients corresponding to owning a smaller amount of the same asset.
This is both counter-intuitive and troubling for the use of AIs as a measure of wealth. A similar
argument can be made for housing characteristics which, when used for estimating wealth, can be
made more informative by ordering their categories by their quality or cost.
Another drawback of AIs is that they lack parsimony: they are often defined by hundreds of
coefficients, one for each number of the assets owned and each type of housing characteristic.
Therefore, the contribution of an individual asset to the index cannot be determined. Understanding
which assets and housing characteristics are the major drivers for the variation of wealth across
households could be of great importance for studying its socioeconomic fabric, designing future
surveys and cross-country comparisons.
This paper proposes an improvement on Filmer and Pritchett's approach to compu ting AIs by
including monotonicity constraints which force the coefficients of dummy variables to respect the
ordering of their corresponding categories. This can be readily done by applying categorical PCA
(CATPCA: Gifi, 1990; Michailidis and de Leeuw, 1998) to household surveys data. CATPCA is
analogous to MCA with the addition of monotonicity constraints. In this paper we compute the
CATPCA components by applying PCA to categorical variables scaled using aspect analysis (Mair
and De Leeuw, 2010).
We also apply least squares sparse PCA (SPCA: Merola, 2015) to the aspect scaled categorical
variables to derive sparse principal components, which show the key drivers of variation across
households using only a limited number of variables. This involves only a small loss of optimality
while retaining the monotonicity constraints. Interpreting sparse AIs is much simpler than interpret-
ing AIs defined as combinations of all the variables, because a few key variables that explain the
most variance of the dataset can be quickly identified. As far as we are aware, this is the first time
that CATPCA and SPCA have been used together to compute sparse components for categorical
variables.
Finally, we use the scaled categorical variables to compute individual (per capita) level AIs
from the asset counts for each household and aspect scaled housing categories divided by house-
hold sizes. We show that these AIs are superior to the standard ones both in predicting income
and in classifying income quintiles.
The paper is organized as follows. In Section 2 we give a brief methodological overview of the
statistical techniques used for estimating AIs, including PCA, CATPCA and SPCA: In Section 3
we illustrate the estimation of AIs using CATPCA and SPCA using household survey data from
northwest Vietnam and northeast Laos. Finally, in Section 4, we provide some concluding remarks
and suggestions for future research.
MEROLA AND BAULCH
|
641

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT