Predicting US bank failures: A comparison of logit and data mining models

DOIhttp://doi.org/10.1002/for.2487
Published date01 March 2018
AuthorYi Fang,Zhongbo Jing
Date01 March 2018
Received: 4 December 2015 Revised: 18 August 2016 Accepted: 22 June 2017
DOI: 10.1002/for.2487
RESEARCH ARTICLE
Predicting US bank failures: A comparison of logit and data
mining models
Zhongbo Jing1Yi Fang2
1School of Management Science and
Engineering,Central University of Finance
and Economics, Beijing,China
2School of Finance, Central University of
Finance and Economics, Beijing, 100081,
China
Correspondence
Zhongbo Jing, School of Management
Science and Engineering, Central
University of Finance and Economics,
Beijing 100081, China.
Email: jing_zhongbo@126.com
Funding information
National Natural Science Foundation of
China, Grant/AwardNumber: 71273257,
71532013, 71401188, 71503290, 71673315
and CUFE YoungResearcher
Development Fund (QJJ1619)
Abstract
Predicting bank failures is important as it enables bank regulators to take timely
actions to prevent bank failures or reduce the cost of rescuing banks. This paper
compares the logit model and data mining models in the prediction of bank
failures in the USA between 2002 and 2010 using levels and rates of change
of 16 financial ratios based on a cross-section sample. The models are esti-
mated for the in-sample period 2002–2009, while data for the year 2010 are used
for out-of-sample tests. The results suggest that the logit model predicts bank
failures in-sample less precisely than data mining models, but produces fewer
missed failures and false alarms out-of-sample.
KEYWORDS
bank failure prediction, logit model, neural network, support vector machine
1INTRODUCTION
Predicting bank failures is crucial as it can prevent banks
from failing or minimize the costs of bank failures to
taxpayers (Thomson, 1991). Compared to supervisory rat-
ings, financial ratio and peer group analysis, which have
also been applied to evaluate bank risk (Sahajwala &
Van den Bergh, 2000), statistical prediction models have
two advantages. First, statistical models try to identify
high-risk banks reasonably well in advance, whereas the
above approaches focus on the current condition of a bank.
Second, statistical models can adopt various advanced
techniques to determine the leading relationship between
financial indicators and bank failures. This paper com-
pares three statistical models in the prediction of bank
failures in the USA over the period 2002–2010.
Various statistical models have been applied to predict
the bankruptcy of financial or nonfinancial companies.
Altman (1968) is the first researcher to employ a discrimi-
nant analysis model using Z-scores to predict f irm failures.
Sinkey (1975) employs the same model to predict bank
failures in the USA from 1969 to 1972. Quadratic discrimi-
nant analysis and the combination of several discriminant
models are then employed to improve the prediction accu-
racy (Altman, 1977; Lam & Moy, 2002; Meyer & Pifer,
1970). Lev and Ohlson (1982) find that stock returns are
highly associated with the accounting information, espe-
cially earnings data, and market data can also be employed
to predict firm bankruptcies. Agarwal and Taffler (2008)
employ the market data and accounting data to estimate
contingent claims models and Z-scores, respectively, and
compare the bankruptcy prediction performance between
these two models. They find that these two kinds of models
have almost the same predictive ability, whereas Z-scores
lead to higher bank profitability than contingent claims
models based on different misclassification costs.
Martin (1977) employs the logit model to predict bank
failures in the USA in the 1930s. The logit model is a non-
linear model where the dependent variable is a dummy
variable, which is one for bank failure and zero otherwise.
After that, various articles adopted this model to predict
nonfinancial firm or bank failures in different countries
Journal of Forecasting. 2018;37 235–256. wileyonlinelibrary.com/journal/for Copyright © 2017 John Wiley & Sons, Ltd. 235:
236 JING AND FANG
(cf. Boyacioglu, Kara, & Baykan,2009; Konstandina, 2006).
Recently, Altman, Cizel, and Rijken (2014) apply the logit
model to predict bank distress in 15 Western European
countries and the USA during 2007–2012. They find that
prediction based on a given model displays cross-country
variation in the classification of bank distress. Betz, Oprica,
Peltonen, and Sarlin (2014) employ the logit model to pre-
dict bank distress in the European banking sector during
2000–2013. Their results suggest that this model yields
useful out-of-sample predictions.
Other types of statistical models are also applied to pre-
dict bank failures. Wheelock and Wilson (2000) employ
a hazard model to investigate the determinants of bank
failures in the USA during 1984–1993. Similarly, Molina
(2002) employs a hazard model to find the determinants
of bank failures in the Venezuelan banking crisis during
1994–1995.
Technically, predicting is a classification method to cate-
gorize an object to an appropriate (e.g., failed or nonfailed)
group. This is what data mining models focus on. To solve
the classification problem, data mining models capture the
relationships between dependent and independent vari-
ables by learning from the data, and they impose fewer
constraints than traditional statistical models, likethe logit
model, on the distribution of the data. In other words,
data mining models are machine learning systems and
are applied to predict failures by modifying their internal
parameters. Another advantage of data mining models is
that they only need cross-section data, which are easily
obtained. Therefore, data mining models show a promis-
ing potential for prediction, and neural networks and
support vector machines have been widely employed for
predicting firm failures (Min & Lee, 2005; Shin, Lee, &
Kim, 2005). Recently,Boyacioglu et al. (2009) employ neu-
ral networks, support vector machines and multivariate
models to predict bank failures in Turkey between 1997
and 2003, and find that neural networks have the best
prediction performance. Holopainen and Sarlin (2015b)
employ the logit model, k-nearest neighbors, and other
models to predict bank failures in the Eurozone during
2000:Q1–2014:Q4, and find that the k-nearest neighbors
model performs well. In addition, data mining models
have also been employed to predict banking crises or cur-
rency crises at country level (see, e.g., Holopainen & Sarlin,
2015a; Nag & Mitra, 1999; Peltonen, 2006).
This paper focuses on three problem areas. First, despite
the advantages of applying data mining models, few papers
have employed these models in predicting (recent) bank
failures even though bank failures arguably have more
negative consequences than nonfinancial firm failures.1
1López-Iturriaga, López-de-Foronda, and Sanz (2010) apply a neuralnet-
work model to predict US bank failures that occurred in the first half of
2010, but they do not compare the performance of their neural network
model with other models.
During the recent Global Financial Crisis (2008–2010), 334
banks failed in the USA compared to a total of 21 bank
failures between 2002 and 2007.2This banking crisis has
led to a cumulative reduction of 31% of US output; its fis-
cal cost amounted to 4.5% of US GDP between 2007 and
2010 (Laeven & Valencia,2013).3Therefore, studying bank
failures prediction in the USA during the Global Finan-
cial Crisis may yield useful and valuable insights for bank
supervisors. In addition, adequate bank failure sampling
during the crisis period has helped this paper in studying
prediction models generically.
Second, few papers have employed variables that reflect
the speed of financial ratios' deterioration as leading indi-
cators. Actually,a bank failure may have been preceded by
a strong decrease in asset size, earnings or other aspects.
Therefore, a higher deterioration speed of financial ratios
could be an important leading indicator for bank failures.
Third, for prediction the in-sample and out-of-sample
should be distinguished by time instead of randomly
chosen. Most existing research using data mining mod-
els randomly splits the dataset into a part over which
the parameters are estimated (the in-sample) and a part
for prediction (the out-of-sample) (Divsalar, Roodsaz,
Vahdatinia, Norouzzadeh,& Behrooz, 2012). For example,
Shin et al. (2005) arbitrarily choose 80% of the data as the
in-sample and the remaining 20% as the out-of-example.
Min and Lee (2005) and Boyacioglu et al. (2009) use a sim-
ilar approach. However, prediction models typically use
past information to predict future bank failures, and it is
more reasonable to split the sample by time.
This paper aims at investigating these problems men-
tioned above. First, this paper compares neural networks,
support vector machines, and the logit model to predict a
cross-section of 293 bank failures in the USA during the
period 2002–2010. These three models are well known and
have stable performances according to the literature. The
sample in this paper is a cross-section dataset which is sim-
ilar to the literature applying the data mining models; thus
the hazard model and other models which need time infor-
mation are not employed. Second, I choose 16 financial
ratios covering Capital adequacy, Asset quality, Manage-
ment quality,Earnings, Liquidity and Sensitivity to market
risk (also known as CAMELS, which is a supervisory rating
framework for evaluating a bank's comprehensive finan-
cial condition). In addition, I also employ rates of change
of the financial ratios as leading indicators. Therefore, this
paper employs comprehensive leading indicators to pre-
dict bank failures. As is common in the literature on early
warning models (see, e.g., Canbas, Cabuk, & Kilic, 2005),
2The list of bank failures is obtained from the website: https://www.fdic.
gov/bank/individual/failed/banklist.html.
3Fiscal costs are gross fiscal outlays for the restructuring of the financial
sector. They include fiscal costs for bank recapitalizations but without
asset purchases and direct liquidity assistance from the Treasury.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT