Features selection, data mining and finacial risk classification: a comparative study

Published date01 October 2016
Date01 October 2016
DOIhttp://doi.org/10.1002/isaf.1395
RESEARCH ARTICLE
Features selection, data mining and finacial risk classification: a
comparative study
Salim Lahmiri
ESCA School of Management, Casablanca,
Morocco
Correspondence
Salim Lahmiri, ESCA School of Management,
Casablanca, Morocco.
Email: slahmiri@esca.ma
Summary
The aim of this paper is to compare several predictive models that combine features selection
techniques with data mining classifiers in the context of credit risk assessment in terms of accu-
racy, sensitivity and specificity statistics. The tstatistic, Battacharrayia statistic, the area
between the receiver operating characteristic, Wilcoxon statistic, relative entropy, and genetic
algorithms were used for the features selection task. The selected features are used to train
the support vector machine (SVM) classifier, backpropagation neural network, radial basis func-
tion neural network, linear discriminant analysis and naive Bayes classifier. Results from three
datasets using a 10fold crossvalidation technique showed that the SVM provides the best accu-
racy under all features selections techniques adopted in the study for all three datasets. There-
fore, the SVM is an attractive classifier to be used in real applications for bankruptcy
prediction in corporate finance and financial risk management in financial institutions. In addition,
we found that our best results are superior to earlier studies on the same datasets.
KEYWORDS
classification, credit risk, data mining, features selection
1|INTRODUCTION
In corporate finance, financial risk prediction is important for financial
decisionmaking and profitability of financial institutions. Therefore,
several predictive models based on data mining techniques have been
developed to accurately classify bankrupted and nonbankrupted com-
panies. The problem of financial risk prediction is important for finan-
cial institutions and investors for better risk control (Cochran et al.,
2006; Sanz and Ayca; 2006; Pindado et al., 2008; Abdou and Poiton,
2011; Platt and Platt, 2012; Savona and Vezzoli, 2012; Çelik, 2013;
Bastos and Pindado, 2013; Evans and Borders, 2014; Mendes et al.,
2014; Ciampi, 2015). Therefore, the topic has received much attention
in corporate finance and risk management literature. Indeed, several
data mining techniques have been used for the prediction of financial
risk, including the AdaBoost algorithm (Alfaro et al., 2008), fuzzy
support vector machine (SVM) (Chaudhuria and De, 2011), SVM (Song
et al., 2010; Horta and Camanho, 2013), backpropagation neural net-
work (BPNN) (Trinkle and Baldwin, 2007; Peat and Jones, 2012; Lee
and Choi, 2013), radial basis function neural networks (RBFNNs)
(Divsalar et al., 2011), decision tree algorithms (Delen et al., 2013),
ensemble of SVMs (Sun and Li, 2012), linear programming (Divsalar
et al., 2011), ifthen rules (Davalos et al., 2014), genetic algorithms
(GAs) (Gordini, 2014), ensemble systems (Sun, 2012; Figini et al.,
2016), casebased reasoning system (Li et al., 2013), probabilistic neu-
ral network (Pendharkar, 2011) and fuzzy neural approach (Quek et al.,
2009),
Recently, several studies have adopted a particular feature selec-
tion technique to improve the classifier accuracy in predicting financial
risk. Indeed, the goal of feature selection is to identify the most infor-
mative features used as patterns in the classification task and remove
redundant ones. In this regard, selected features are expected to
improve classification/prediction results and help in reducing the pro-
cessing time of the classifier. For instance, these studies used tstatistic
(Ravisankar et al., 2011), principal component analysis (Chen, 2012;
Lin, 2012), partial least square (Yang et al., 2011; SerranoCinca and
GutiérrezNieto, 2013), stepwise discriminant analysis (Li and Sun,
2011), information gain (Wang et al., 2014), GAs (Oreski and Oreski,
2014), decision trees (Cho et al., 2010), classification tree method
(BrezigarMasten and Masten, 2012) and logistic regression (Lahmiri
and Gagnon, 2015) to perform the features selection task.
However, it is difficult to select an adequate combination of fea-
tures selection technique and classifier for a given dataset, which is
an important task in bankruptcy prediction. Indeed, the performance
of the classifier depends on the features selection techniques and
DOI 10.1002/isaf.1395
Intell Sys Acc Fin Mgmt 2016; 23: 265275 Copyright © 2016 John Wiley & Sons, Ltd.wileyonlinelibrary.com/journal/isaf 265

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT