DESIGNING AN IF–THEN RULES‐BASED ENSEMBLE OF HETEROGENEOUS BANKRUPTCY CLASSIFIERS: A GENETIC ALGORITHM APPROACH

AuthorZhiyan Cao,Sergio Davalos,Fei Leng,Ehsan H. Feroz
Published date01 July 2014
DOIhttp://doi.org/10.1002/isaf.1354
Date01 July 2014
DESIGNING AN IFTHEN RULES-BASED ENSEMBLE OF
HETEROGENEOUS BANKRUPTCY CLASSIFIERS: A GENETIC
ALGORITHM APPROACH
SERGIO DAVALOS,*FEI LENG, EHSAN H. FEROZ AND ZHIYAN CAO
Milgard School of Business, University of WashingtonTacoma,Tacoma, WA, USA
SUMMARY
This paper proposesa framework for an ensemble bankruptcy classierthat uses ifthen rules to combinethe outputs
from a heterogeneo us set of classiers. A geneticalgorithm (GA) induces the rulesusing an asymmetric, cost-sensitive
tness function that includes accuracy and misclassication costs. The GA-based ensemble classier outperforms
individualclassiers and ensemble classiers generated byother methods. The results ofthe classier are in the form
of ifthen rules. We applythe approach to a balanced datasetand an imbalanced dataset. Bothare composed of rms
subject to nancial distress and cited in the US Securities and Exchange Commissions Accounting and Auditing
Enforcement Releases. Copyright © 2014 John Wiley& Sons, Ltd.
Keywords: ensemble; genetic algorithms; rule-based; bankruptcy classication; asymmetric cost function
1. INTRODUCTION
We demonstrate the effectiveness of using a genetic algorithm (GA) to induce ifthen rules that
combines the output from a set of individual bankruptcy classiers. Breiman, Friedman, Olshen, and
Stone (1984) identied two main goals of bankruptcy classication.
1
One goal is to generate an accu-
rate classication model that is generalizable based on a given dataset. The other goal is to understand
the underlying predictive or classication structure of the problem. Although both objectives are impor-
tant, most classication models have focused on classication accuracy (Flach & Lavrac, 2003). While
these models provide better predictive power than comprehensible models, they provide a poor repre-
sentation of the solution (Breiman et al., 1984). In this study, we strive to achieve both goals by
generating a model that is both accurate and comprehensible.
Model comprehensibility refers to the extent a human expert can understand the decision structure. A
comprehensible model needs to provide an understanding of the variables and conditions involved in
bankruptcy classication. On the other hand, classication accuracy is generally measured as the
percentage of cases correct ly classied. Type I and type II error rates are included when misclassication
is a key issue.To achieve bankruptcy classication accuracy, a model needsto address several issues. First,
the bankruptcy prediction problem has a nonlinear search space (Mahfoud & Mani, 1995). Traditional
linear models have problemswith this type of search space. Second, the costof misclassifying a bankrupt
rm as nonbankrupt is greater thanmisclassifying a nonbankrupt rm as bankrupt(OLeary, 1998; Chen
* Correspondence to: Sergio Davalos, Milgard School of Business, Universityof WashingtonTacoma, Tacoma, WA, USA.
E-mail: sergiod@u.washington.edu
1
Werefer thereader to Balcaen and Ooghe (2004, 2006) for a comprehensive review of statistical and alternative studies of bank-
ruptcy classication.
Copyright © 2014 John Wiley & Sons, Ltd.
INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE AND MANAGEMENT
Intell. Sys. Acc. Fin. Mgmt. 21, 129153 (2014)
Published online 12 June 2014 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/isaf.1354
et al., 2010). Most bankruptcy classication models do not consider such asymmetric cost of
misclassication. Third,the bankruptcy dataset usedis typically imbalanced sincethe percentage of bank-
rupt rms is small relative to the nonbankrupt rms in the population. Most classiers tend to be biased
towardsthe majority class (Gu, Cai, Zhu,& Huang, 2008). Classiers thatuse accuracy as the performance
measurement have difculty in discerning the bankrupt class withan imbalanced dataset (Polikar, 2006).
Last but not theleast, the characteristics ofthe dataset can further complicatethe bankruptcy prediction. In
this paper,we deal with a challenging dataset consistingof rms cited by the US Securities andExchange
Commissions(SECs) Accounting and Auditing Enforcement Releases (AAERs).
2
Adverse corporate
events such as an AAER can induce or exacerbate nancial distress to rms. A population composed of
a disproportionate number of distressed rms tends to be more difcult for bankruptcy classication
models than a population consisting of both distressed and healthy rms in roughly equal proportions
(Ohlson, 1980; Hopwood et al., 1994). Additionally, the relatively low frequency of rms with such
adverse events results in a small dataset, also posing an empirical challenge.
Toachieve both comprehensibility and accuracy,we propose a GA approachfor the induction of ifthen
rules that combine the output of heterogeneous bankruptcy classiers. We use a tness function that
includes both accuracy values and misclassication values. The resulting rule-based ensemble classier
is asymmetrically cost sensitive (biased toward addressing misclassications). We show that our approach
can work well for a small dataset consisting of distressed rms such as the AAER rms. In terms of
classication accuracy, we demonstrate that the ensemble classier provides a near-optimal combination
of the base classiers and outperforms the individual classiers (base learners) as well as three other
ensemblemethods that are commonly used for combiningclassiers (i.e. decisiontree (DT), majority vote
and compensating aggregated).We also showthat the GA approach provides moreexibility and variety in
the rules generatedthan alternative methodsdo (i.e. DT). In terms of model comprehensibility, the ifthen
rules we derive clearly specify how the predictive decision is reached.
The major contribution of this paper is that we introduce a GA method for generating a bankruptcy
ensemble classier that combines a set of heterogeneous classiers. Prior bankruptcy research using a
GA approach for rule induction has only focused on generating individual classiers (e.g. Shin & Lee,
2002; Kim & Han, 2003; Davalos, Leng, Feroz, & Cao, 2009). In addition, ensemble approaches for
combining heterogeneous bankruptcy classiers have used linear forms of mathematical formulas
(e.g. Olmeda & Fernandez, 1997). To the best of our knowledge, we are the rsttolinkthesetwo
approaches together by creating a set of GA-generated ifthen rules that adaptively combines
heterogeneous bankruptcy classiers. Our ifthen rule-based ensemble approach has three distinctive
advantages. First, the ifthen type of model is comprehensible in human terms. Second, since the
resulting bankruptcy ensemble model is based on ifthen rules, it can derive a nonlinear solution.
Previous ensemble methods relying on a sum rule, a majority or weighted vote, or weighted sum
methods for combining classiers (Ravikumar & Ravi, 2006) have all assumed a linear search space.
In the accounting and nance domains, the problem space is nonlinear with data that are highly
dimensional and not normally distributed (Mahfoud & Mani, 1995). Therefore, nonlinear, rule-based
classiers are expected to perform better than linear classiers under these circumstances. Third, our
GA approach integrates an asymmetric cost function into the model. Olmeda and Fernandez (1997)
used genetic programming to derive a bankruptcy ensemble model using a tness function that
equally weights misclassication costs of bankrupt and nonbankrupt rms. In contrast, our method
accounts for the fact that misclassifying a bankrupt rm as nonbankrupt is more costly.
2
After a period of investigation, the SEC can issue one or more AAERs against publicly held companies, auditors and corporate
ofcers for alleged violations of generally accepted accounting principles. These are considered severe regulatory sanctions.
130 S. DAVALOS
Copyright © 2014 John Wiley & Sons, Ltd. Intell. Sys. Acc. Fin. Mgmt., 21, 129153 (2014)
DOI: 10.1002/isaf

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT