DESIGNING AN IF–THEN RULES‐BASED ENSEMBLE OF HETEROGENEOUS BANKRUPTCY CLASSIFIERS: A GENETIC ALGORITHM APPROACH
Author | Zhiyan Cao,Sergio Davalos,Fei Leng,Ehsan H. Feroz |
Published date | 01 July 2014 |
DOI | http://doi.org/10.1002/isaf.1354 |
Date | 01 July 2014 |
DESIGNING AN IF–THEN RULES-BASED ENSEMBLE OF
HETEROGENEOUS BANKRUPTCY CLASSIFIERS: A GENETIC
ALGORITHM APPROACH
SERGIO DAVALOS,*FEI LENG, EHSAN H. FEROZ AND ZHIYAN CAO
Milgard School of Business, University of Washington–Tacoma,Tacoma, WA, USA
SUMMARY
This paper proposesa framework for an ensemble bankruptcy classifierthat uses if–then rules to combinethe outputs
from a heterogeneo us set of classifiers. A geneticalgorithm (GA) induces the rulesusing an asymmetric, cost-sensitive
fitness function that includes accuracy and misclassification costs. The GA-based ensemble classifier outperforms
individualclassifiers and ensemble classifiers generated byother methods. The results ofthe classifier are in the form
of if–then rules. We applythe approach to a balanced datasetand an imbalanced dataset. Bothare composed of firms
subject to financial distress and cited in the US Securities and Exchange Commission’s Accounting and Auditing
Enforcement Releases. Copyright © 2014 John Wiley& Sons, Ltd.
Keywords: ensemble; genetic algorithms; rule-based; bankruptcy classification; asymmetric cost function
1. INTRODUCTION
We demonstrate the effectiveness of using a genetic algorithm (GA) to induce if–then rules that
combines the output from a set of individual bankruptcy classifiers. Breiman, Friedman, Olshen, and
Stone (1984) identified two main goals of bankruptcy classification.
1
One goal is to generate an accu-
rate classification model that is generalizable based on a given dataset. The other goal is to understand
the underlying predictive or classification structure of the problem. Although both objectives are impor-
tant, most classification models have focused on classification accuracy (Flach & Lavrac, 2003). While
these models provide better predictive power than comprehensible models, they provide a poor repre-
sentation of the solution (Breiman et al., 1984). In this study, we strive to achieve both goals by
generating a model that is both accurate and comprehensible.
Model comprehensibility refers to the extent a human expert can understand the decision structure. A
comprehensible model needs to provide an understanding of the variables and conditions involved in
bankruptcy classification. On the other hand, classification accuracy is generally measured as the
percentage of cases correct ly classified. Type I and type II error rates are included when misclassification
is a key issue.To achieve bankruptcy classification accuracy, a model needsto address several issues. First,
the bankruptcy prediction problem has a nonlinear search space (Mahfoud & Mani, 1995). Traditional
linear models have problemswith this type of search space. Second, the costof misclassifying a bankrupt
firm as nonbankrupt is greater thanmisclassifying a nonbankrupt firm as bankrupt(O’Leary, 1998; Chen
* Correspondence to: Sergio Davalos, Milgard School of Business, Universityof Washington–Tacoma, Tacoma, WA, USA.
E-mail: sergiod@u.washington.edu
1
Werefer thereader to Balcaen and Ooghe (2004, 2006) for a comprehensive review of statistical and alternative studies of bank-
ruptcy classification.
Copyright © 2014 John Wiley & Sons, Ltd.
INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE AND MANAGEMENT
Intell. Sys. Acc. Fin. Mgmt. 21, 129–153 (2014)
Published online 12 June 2014 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/isaf.1354
et al., 2010). Most bankruptcy classification models do not consider such asymmetric cost of
misclassification. Third,the bankruptcy dataset usedis typically imbalanced sincethe percentage of bank-
rupt firms is small relative to the nonbankrupt firms in the population. Most classifiers tend to be biased
towardsthe majority class (Gu, Cai, Zhu,& Huang, 2008). Classifiers thatuse accuracy as the performance
measurement have difficulty in discerning the bankrupt class withan imbalanced dataset (Polikar, 2006).
Last but not theleast, the characteristics ofthe dataset can further complicatethe bankruptcy prediction. In
this paper,we deal with a challenging dataset consistingof firms cited by the US Securities andExchange
Commission’s(SEC’s) Accounting and Auditing Enforcement Releases (AAERs).
2
Adverse corporate
events such as an AAER can induce or exacerbate financial distress to firms. A population composed of
a disproportionate number of distressed firms tends to be more difficult for bankruptcy classification
models than a population consisting of both distressed and healthy firms in roughly equal proportions
(Ohlson, 1980; Hopwood et al., 1994). Additionally, the relatively low frequency of firms with such
adverse events results in a small dataset, also posing an empirical challenge.
Toachieve both comprehensibility and accuracy,we propose a GA approachfor the induction of if–then
rules that combine the output of heterogeneous bankruptcy classifiers. We use a fitness function that
includes both accuracy values and misclassification values. The resulting rule-based ensemble classifier
is asymmetrically cost sensitive (biased toward addressing misclassifications). We show that our approach
can work well for a small dataset consisting of distressed firms such as the AAER firms. In terms of
classification accuracy, we demonstrate that the ensemble classifier provides a near-optimal combination
of the base classifiers and outperforms the individual classifiers (base learners) as well as three other
ensemblemethods that are commonly used for combiningclassifiers (i.e. decisiontree (DT), majority vote
and compensating aggregated).We also showthat the GA approach provides moreflexibility and variety in
the rules generatedthan alternative methodsdo (i.e. DT). In terms of model comprehensibility, the if–then
rules we derive clearly specify how the predictive decision is reached.
The major contribution of this paper is that we introduce a GA method for generating a bankruptcy
ensemble classifier that combines a set of heterogeneous classifiers. Prior bankruptcy research using a
GA approach for rule induction has only focused on generating individual classifiers (e.g. Shin & Lee,
2002; Kim & Han, 2003; Davalos, Leng, Feroz, & Cao, 2009). In addition, ensemble approaches for
combining heterogeneous bankruptcy classifiers have used linear forms of mathematical formulas
(e.g. Olmeda & Fernandez, 1997). To the best of our knowledge, we are the firsttolinkthesetwo
approaches together by creating a set of GA-generated if–then rules that adaptively combines
heterogeneous bankruptcy classifiers. Our if–then rule-based ensemble approach has three distinctive
advantages. First, the if–then type of model is comprehensible in human terms. Second, since the
resulting bankruptcy ensemble model is based on if–then rules, it can derive a nonlinear solution.
Previous ensemble methods relying on a sum rule, a majority or weighted vote, or weighted sum
methods for combining classifiers (Ravikumar & Ravi, 2006) have all assumed a linear search space.
In the accounting and finance domains, the problem space is nonlinear with data that are highly
dimensional and not normally distributed (Mahfoud & Mani, 1995). Therefore, nonlinear, rule-based
classifiers are expected to perform better than linear classifiers under these circumstances. Third, our
GA approach integrates an asymmetric cost function into the model. Olmeda and Fernandez (1997)
used genetic programming to derive a bankruptcy ensemble model using a fitness function that
equally weights misclassification costs of bankrupt and nonbankrupt firms. In contrast, our method
accounts for the fact that misclassifying a bankrupt firm as nonbankrupt is more costly.
2
After a period of investigation, the SEC can issue one or more AAERs against publicly held companies, auditors and corporate
officers for alleged violations of generally accepted accounting principles. These are considered severe regulatory sanctions.
130 S. DAVALOS
Copyright © 2014 John Wiley & Sons, Ltd. Intell. Sys. Acc. Fin. Mgmt., 21, 129–153 (2014)
DOI: 10.1002/isaf
To continue reading
Request your trial