FORECASTING FIRM RISK IN THE EMERGING MARKET OF CHINA WITH SEQUENTIAL OPTIMIZATION OF INFLUENCE FACTORS ON PERFORMANCE OF CASE‐BASED REASONING: AN EMPIRICAL STUDY WITH IMBALANCED SAMPLES

AuthorHui Li,Qing Zhou,Jian‐Hu Cai,Jun‐Ling Yu
Published date01 July 2013
DOIhttp://doi.org/10.1002/isaf.1342
Date01 July 2013
FORECASTING FIRM RISK IN THE EMERGING MARKET OF
CHINA WITH SEQUENTIAL OPTIMIZATION OF INFLUENCE
FACTORS ON PERFORMANCE OF CASE-BASED REASONING:
AN EMPIRICAL STUDY WITH IMBALANCED SAMPLES
HUI LI,
a
*JUN-LING YU,
a
QING ZHOU
b
AND JIAN-HU CAI
c
a
School of Economics and Management, Zhejiang Normal University, PO Box 62, 688 YingBinDaDao, Jinhua, Zhejiang
321004, People's Republic of China
b
School of Management, Hangzhou Dianzi University, Hangzhou, Zhejiang People's Republicof China
c
College of Economics and Management,Zhejiang University of Technology, Hangzhou,Zhejiang 310023, People's Republic of China
SUMMARY
With the development of the Chinese economy, how to make the right decision regarding rmsrisk is becoming
more and more important. Case-based reasoning (CBR) is a potential method that can help forecast business risk
status in advance; it is easy to apply and is able to provide good explanations of output. In order to obtain more
accurate prediction with CBR, it is essential to investigate factors that inuence CBR's performance, and to
optimize these factors sequentially for the improvement of CBR's performance in rm risk prediction in emerging
markets under a more practicable assumption. We veried that sequential optimization of feature selection, feature
weighting, instance selection and the number of nearest neighbours is a possible alternative for improving
predictive performance of CBR forecasting under the assumption that the number of failed samples is smaller than
that of nonfailed samples. The detailed implementation includes: (1) selecting signicant features through a
correlation matrix and reducing feature dimensions with factor analysis; (2) using variance contribution ratios of
features from factor analysis as feature weights; (3) eliminating noisy cases via a state matrix; and (4) obtaining
the optimal number of nearest neighbours from empirical results among different numbers of nearest neighbours.
To validate the usefulness of the sequential optimization approach, we applied it to a real-world case: rm risk
prediction with imbalanced data from the emerging market of China. Experimental results show that predictive
accuracy of CBR applied in the emerging market was improved with the sequential optimization approach.
Insightful thoughts from the results of the sequential optimization of the CBR forecasting system on modelling
social tasks are also provided. Copyright © 2013 John Wiley & Sons, Ltd.
Keywords: emerging market rm risk; case-based reasoning; rm risk prediction; imbalanced samples; modelling
social tasks
1. INTRODUCTION
The emerging market has been gaining more and more focus in recent years in economic and business
researches. With the development of the Chinese emerging market, more and more companies and
businesses are investing and trading in China. Therefore, it is valuable to help business managers to re-
duce rm-related investment risk. Various techniques or methods might be useful. Methods that are
easily understandable are more attractive. Case-based reasoning (CBR) is an important method in
* Correspondence to: Hui Li, School of Economics and Management, Zhejiang Normal University, PO Box 62, 688
YingBinDaDao, Jinhua, Zhejiang 321004, People's Republic of China. E-mail: lihuihit@gmail.com
Copyright © 2013 John Wiley & Sons, Ltd.
INTELLIGENT SYSTEMS IN ACCOUNTING, FINANCE AND MANAGEMENT
Intell. Sys. Acc. Fin. Mgmt. 20, 141161 (2013)
Published online 11 July 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/isaf.1342
business forecasting. It is a problem-solving method that reuses experienced similar cases to nd
solutions of new problems (Schank, 1982). CBR provides a solution to a new problem by referring
to the library of stored experienced cases the case base. This method mirrors problem-solving ap-
proaches of human beings; namely, solving the current problems with past experiences (Ahn et al.,
2003). CBR often shows signicant promise in improving the effectiveness of decision-making under
complex and unstructured situations. It is able to explain why a solution is provided by presenting
similar experienced cases. Consequently, CBR has been applied to various problem-solving areas,
including engineering, nance, marketing, medical diagnosis, and so on (Bryant, 1997; Hsu et al.,
2004; Chun and Park, 2006; Du et al., 2012; Lee et al., 2012). This method is very appropriate for rm
risk prediction for the following reasons (Li and Sun, 2010):
1. CBR is able to explain the cause of failure and provide solutions for decision-makers if solutions
of experienced cases are provided in the case base.
2. Compared with some statistical methods which assume a normal distribution of data, CBR does
not make serious data distribution assumptions if it is used as an intelligent forecasting method.
3. The problem of rm risk prediction involves relatively small data sets, which is suitable for the
application of CBR.
In addition, feature selection and instance selection will provide reduced data representations for
CBR, which decreases the processing time needed for CBR. For example, Kim (2004) used reduced
data representation in his application.
In order to make full use of CBR's advantages, many studies have been conducted to enhance the
performance of CBR. Among them, feature selection (Cardie, 1993; Skalak, 1994; Domingos,
1997), feature weighting (Park and Han, 2002; Hsu et al., 2004; Chun and Park, 2006; Yuan and Chiu,
2009), instance selection (Lipowezky, 1998; Babu and Murty, 2001; Huang et al., 2002) and selection
of the number of neighbours (Lee and Park, 1999; Ahn et al., 2003; Park et al., 2006) show potential in
improving CBR's performance. All the above studies are to optimize CBR. Most prior research has
tried to optimize these parameters independently. One of the state-of-the-art techniques to improve
CBR's performance is to optimize all four factors sequentially, which needs to be investigated.
In the real world, the number of failed samples is much smaller than the number of nonfailed samples.
Traditionally,we assume the number of failed samplesis nearly the same as the number of nonfailedsam-
ples, under which assumption various predictive models are constructed. These models are expected to
generate high performance on both types of samples. However, the modelsconstructed on the assumption
that the number of failed samples is nearly the same as the number of nonfailed samples are commonly
not suitable for the situation that the number offailed samples is far smaller than the number of nonfailed
samples. A model constructed under the assumptionthat the numbers of both types of sampleswithout an
adjustment are nearly the same yields to overtting of the majority samples while neglecting the minority
samples. Consider the following illustration. A model will easy generate 99% accuracy with a dataset
consisting of 99% nonfailed samples and 1% failed samples. Under the assumption that the numbers
of both types of samples are nearly the same, the 99% accuracy commonly indicates that the model is
expected to generate very high performance on identication of nonfailed samples and failed sam ples.
However, with such a model applied to the situation where the number of failed samples is far smaller
than the number of nonfailed samples, the model yields to misidentifying all the minority samples in or-
der to achieve high performance under the assumption that the numbers of both types of samples are
nearly the same.As a result, predictive models should be built under an assumption that meets more with
the real-world situation, as this type of model will be more applicable to solving the problem.
142 H. LI ET AL.
Copyright © 2013 John Wiley & Sons, Ltd. Intell. Sys. Acc. Fin. Mgmt. 20, 141161 (2013)
DOI: 10.1002/isaf

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT