An innovative feature selection method for support vector machines and its test on the estimation of the credit risk of default

AuthorEduard Sariev,Guido Germano
Date01 July 2019
Published date01 July 2019
DOIhttp://doi.org/10.1002/rfe.1049
404
|
wileyonlinelibrary.com/journal/rfe Rev Financ Econ. 2019;37:404–427.
1 | INTRODUCTION
The introduction of the Basel III guidelines (BCBS, 2001) and the new capital requirements that banks must meet have estab-
lished the necessity of an accurate risk assessment. The probability of default (PD) measure is a key estimate not only for risk
assessment, but also for impairment purposes under the changes introduced by International Financial Reporting Standard 9
(IFRS9) (Onali & Ginesti, 2014). Accurate PD assessment is vital for decreasing the cost of capital (Gavalas, 2015). The esti-
mation of the PD has been a topic of extensive research for many years. A high number of different algorithms have been used
to estimate the PD: artificial neural networks (ANN), decision trees (DT), linear discriminant analysis (LDA), support vector
machines (SVM), logistic regression (LR). Harris (2015) provides a good general explanation of these methods. However, LR
remains the most widely used PD estimation method for both corporate and retail borrowers.
Extensive research has been conducted comparing several PD estimation methods. Meyer, Leisch, and Hornik (2003) com-
pared SVM to 25 other methods used for PD estimation. They found that although the performance of the SVM model is good,
other methods such as ANN and DT sometimes outperform SVM. In a more general study, Mukherjee (2003) used SVM and
LR to classify traded companies on the Greek stock exchange, showing that SVM classification was better, still without focus
Received: 11 September 2018
|
Accepted: 12 September 2018
DOI: 10.1002/rfe.1049
ORIGINAL ARTICLE
An innovative feature selection method for support vector
machines and its test on the estimation of the credit risk of default
Eduard Sariev
|
Guido Germano
Department of Computer
Science,University College London,
London, UK
Correspondence
Eduard Sariev, Department of Computer
Science, University College London,
London, UK.
Email: eduard.sariev.11@ucl.ac.uk
Funding information
Economic and Social Research Council
(ESRC), Grant/Award Number: ES/
K002309/1
Abstract
Support vector machines (SVM) have been extensively used for classification problems
in many areas such as gene, text and image recognition. However, SVM have been
rarely used to estimate the probability of default (PD) in credit risk. In this paper, we
advocate the application of SVM, rather than the popular logistic regression (LR)
method, for the estimation of both corporate and retail PD. Our results indicate that most
of the time SVM outperforms LR in terms of classification accuracy for the corporate
and retail segments. We propose a new wrapper feature selection based on maximizing
the distance of the support vectors from the separating hyperplane and apply it to iden-
tify the main PD drivers. We used three datasets to test the PD estimation, containing (1)
retail obligors from Germany, (2) corporate obligors from Eastern Europe, and (3) cor-
porate obligors from Poland. Total assets, total liabilities, and sales are identified as
frequent default drivers for the corporate datasets, whereas current account status and
duration of the current account are frequent default drivers for the retail dataset.
JEL CLASSIFICATION
C10, C13
KEYWORDS
default risk, logistic regression, support vector machines
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original
work is properly cited.
© 2018 The Authors. Review of Financial Economics published by Wiley Periodicals, Inc. on behalf of University of New Orleans.
|
405
SARIEV And GERMAnO
on the feature selection process. Another comparison between SVM and ANN was made by (Li, Shiue, & Huang, 2006). They
showed that the SVM model slightly outperforms ANN and the SVM model needs fewer features than ANN to achieve maxi-
mum classification performance. Huang, Chen, and Wang (2007) compared SVM with ANN, genetic programming, and DT. In
this comparison, the feature selection process was covered, but the LR model was not used as a comparison. Bellotti and Crook
(2009) compared LR and SVM, but without showing the feature selection method for the LR. Bellotti, Matousek, and Stewarti
(2011) compared LR with SVM, but for regression purposes, not for classification. They found that the SVM model outper-
forms LR. Furthermore, Chen, Härdle, and Moro (2011) compared LR and SVM with regard to the feature selection process.
However, the features selected for the SVM were automatically used for LR and this way the comparison was biased toward
the SVM model: as expected, the SVM model outperformed the LR in this case. Hens and Tiwari (2012) again focused on the
comparison of SVM with genetic programming without including LR. Lessmann, Baesens, Seow, and Thomas (2015) found
that SVM and ANN perform better, but the performance of the LR is still relatively good. Finally, Harris (2015) compared SVM
to LR. Although this study used LR as the only alternative to SVM, a lot of the details of this comparison were not shared; for
instance, the feature selection for both models is not covered at all.
The feature selection process for SVM is a key step in comparing SVM to other algorithms. The existing literature indicates
that some research on SVM feature selection has been developing recently. Weston et al. (2000) proposed a method that is
based upon finding those features which minimize bounds on the leave-one-out error. They show that their method is superior
to some standard feature selection algorithms. Guyon and Elisseeff (2003) provided a good high-level overview of the different
feature selection algorithms available in the literature. Rakotomamonjy (2003) proposed relevance criteria derived from SVM
that are based on a weight vector. He showed that the criterion based on the weight vector derivative achieves good results and
performs consistently well. Chen and Lin (2006) combined SVM and various feature selection strategies. Some of them were
filter-type approaches, i.e., general feature selection methods independent of the SVM, and some were wrapper-type methods,
i.e., modifications of the SVM which can be used to select features. Recently, variable and feature selection has become the
focus of much research. Becker, Werft, Toedt, Lichter, and Benner (2009) investigated a penalized version of SVM for feature
selection. They argued that keeping a high number of features could avoid overfitting if the performance function uses an
L1
norm regularization. Huang and Huang (2010) investigated a recursive feature selection scheme in SVM. Their results have
indicated that one-vs.-one SVM with embedded recursive feature selection outperforms other multiclass SVM. In this context,
Kuhn and Johnson (2013) presented a generalized backward feature elimination procedure for selecting a final combination
of features.
With respect to the above discussed articles on feature selection for SVM, this article contributes to the literature firstly by
proposing an innovative feature selection for SVM and LR. Secondly by showing that most of the time the SVM model renders
higher classification accuracy than logistic regression.
The rest of the article is organized as follows. Section 2 presents the theoretical formulation of SVM. Section 3 contains an
empirical analysis, including the presentation of the data and the obtained results. Section 4 discusses the business rationale of
the selected default drivers. Finally, section 5 concludes the paper, summarizes the main findings of this research, and proposes
some future research directions.
2 | THEORETICAL FOUNDATIONS
2.1 | Support vector machines
Consider a dataset of n pairs
A={(
x
i,yi)|
x
ip,yi∈ {−1, +1}}n
i=1
, where
xi
is a p-dimensional “feature” vector and
yi
is
a label, i.e., a categorical variable whose value gives the class to which
xi
belongs. Provided the data are linearly separable,
SVM build a hyperplane that separates the points with
yi=+1
from those with
yi=−1
maximizing the margin M, i.e., the
Highlights
1. We estimate the probability of default on credit risk data for corporate and retail clients.
2. We compare support vector machines (SVM) and logistic regression (LR).
3. The SVM model often outperforms LR in terms of classification accuracy.
4. We propose and test a new variable selection method designed specifically for SVM.
5. We identify important default drivers and analyse them.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT