On the forecasting of high‐frequency financial time series based on ARIMA model improved by deep learning

DOIhttp://doi.org/10.1002/for.2677
AuthorJing Han,Zhenwei Li,Yuping Song
Published date01 November 2020
Date01 November 2020
RESEARCH ARTICLE
On the forecasting of high-frequency financial time series
based on ARIMA model improved by deep learning
Zhenwei Li
1
| Jing Han
2
| Yuping Song
1
1
School of Finance and Business,
Shanghai Normal University, Shanghai,
PR China
2
School of Finance and Management,
Shanghai University of International
Business and Economics, Shanghai, PR
China
Correspondence
Yuping Song, School of Finance and
Business, Shanghai Normal University,
Shanghai, 200234, PR China.
Email: songyuping@shnu.edu.cn
Funding information
Academic Innovation Team of Shanghai
Normal University, Grant/Award
Number: 310-AC7031-19-004228; Key
Subject of Quantitative Economics of
Shanghai Normal University, Grant/
Award Number: 310-AC7031-19-004221;
Academic Innovation Team, Grant/Award
Number: 310-AC7031-19-004228; Key
Subject of Quantitative Economics, Grant/
Award Number: 310-AC7031-19-004221;
General Research Fund of Shanghai
Normal University, Grant/Award
Number: SK201720; Youth Academic
Backbone Cultivation Project of Shanghai
Normal University, Grant/Award
Number: 310-AC7031-19-003021; National
Statistical Science Research Project,
Grant/Award Number: 2018LZ05;
Ministry of Education, Humanities and
Social Sciences project, Grant/Award
Number: 18YJCZH153; National Natural
Science Foundation of China, Grant/
Award Number: 11901397
Abstract
Through empirical research, it is found that the traditional autoregressive inte-
grated moving average (ARIMA) model has a large deviation for the forecast-
ing of high-frequency financial time series. With the improvement in storage
capacity and computing power of high-frequency financial time series, this
paper combines the traditional ARIMA model with the deep learning model to
forecast high-frequency financial time series. It not only preserves the theoreti-
cal basis of the traditional model and characterizes the linear relationship, but
also can characterize the nonlinear relationship of the error term according to
the deep learning model. The empirical study of Monte Carlo numerical simu-
lation and CSI 300 index in China show that, compared with ARIMA, support
vector machine (SVM), long short-term memory (LSTM) and ARIMA-SVM
models, the improved ARIMA model based on LSTM not only improves the
forecasting accuracy of the single ARIMA model in both fitting and forecast-
ing, but also reduces the computational complexity of only a single deep learn-
ing model. The improved ARIMA model based on deep learning not only
enriches the models for the forecasting of time series, but also provides effec-
tive tools for high-frequency strategy design to reduce the investment risks of
stock index.
KEYWORDS
ARIMA model, high-frequency financial time series, LSTM model, SVM model
1|INTRODUCTION
According to the historical data, an appropriate forecast-
ing model can be constructed to capture the fluctuating
signals of the underlying time series and characterize
their trend, which can provide a reliable basis for
investorsdecision making. For example, through accu-
rate forecasting of the stock index, investors can roughly
grasp the overall trend of the market to effectively cap-
ture trading opportunity and make reasonable
asset allocations. As for the forecasting models of time
series, based on classical models such as autoregressive
Received: 17 June 2019 Revised: 13 December 2019 Accepted: 22 February 2020
DOI: 10.1002/for.2677
Journal of Forecasting. 2020;39:10811097. wileyonlinelibrary.com/journal/for © 2020 John Wiley & Sons, Ltd. 1081
(AR; Yule, 1927) and moving average (MA;
Walker, 1931), the autoregressive moving average model
(ARMA) and autoregressive integrated moving average
model (ARIMA) were proposed (Box, Jenkins, Reinsel, &
Ljung, 2015). After making the original nonstationary
time series to be stationary after d-order difference, the
ARIMA model then can estimate and test the stationary
sequence. It has become one of the more widely used
methods in the study of forecasting models for time
series. The ARIMA model has been used to predict sales
of retail footwear products in one step and multiple steps
and it was found that the ARIMA model had a good fit-
ness for the forecasting of time series (Ramos, Santos, &
Rebelo, 2015).
For financial time series, using the autocorrelation
function, more scholars have verified that financial time
series were time varying (Ding, Granger, & Engle, 1993),
and that the financial data presented the characteristics
of nonlinearity (Chevallier & Sévi, 2012; Giot, Laurent, &
Petitjean, 2010; Slim & Dahmene, 2016). The time-
varying and nonlinear properties and the large stochastic
volatility of the sample data in the financial market have
posed certain difficulties for quantitative forecasting
based on only a single model. Many scholars have
improved the ARIMA model to enhance the accuracy of
forecasting. Based on the linear error correction model,
the ARIMA model was modified by using the support
vector machine (SVM) model to forecast financial time
series and improve forecasting accuracy (Van Gestel
et al., 2006). Particle swarm optimization (PSO) was
adopted to modify the ARIMA-SVM combination model
and to improve the accuracy of forecasting for time series
(de Oliveira & Ludermir, 2014). Accuracy of predictive
power demand was improved by the error correction
model based on PSO optimal Fourier and seasonal
ARIMA (Wang, Wang, Zhao, & Dong, 2012). By consider-
ing a hybrid correction method based on the ARIMA
model, SVM, and cuckoo search algorithm (CSA), the
ARIMA model was modified to predict the power load
(Kavousi-Fard & Kavousi-Fard, 2013).
The above research was mainly focused on daily or
weekly low-frequency data for modeling and forecasting.
However, with the development of science and technol-
ogy, the era of big data had arrived and the storage and
computing power for high-frequency data were
improved. In addition, by using intraday high-frequency
data to estimate volatility, Andersen, Bollerslev, Diebold,
and Labys (2003) found that high-frequency data con-
tained more market information than low-frequency
data, which could improve the accuracy of estimation.
High-frequency data could provide more arbitrage space
and more possibilities for strategy design (Hanson &
Hall, 2012). High-frequency financial time series also
changed the philosophy of investment strategy design
and the investor's investment style. Up to 2012, the total
transaction volume of high-frequency transactions
accounted for 5080% of the total transaction volume of
US equity (Barrales, 2012; Laughlin, Aguirre, &
Grundfest, 2014), and the proportion reached 45% in
Europe, 40% in Japan, and about 12% in other Asian
countries (Menkveld, 2014). In 2014, the development of
high-frequency trading made even greater progress.
Through the study of a large number of financial inter-
mediaries, Biais and Foucault (2014) confirmed that in
the process of running capital for financial intermedi-
aries, although they were not named as high-frequency
transactions, they were also consistent with the charac-
teristics of high-frequency trading strategies.
ARIMA models and their modified models in the
existing literature are mostly used in the nonfinancial
field, and sample frequencies for forecasting have been
relatively low. However, with the increase in frequency
of financial data, high-frequency data are highly
nonlinear (Jobson & Korkie, 1981) and nonnormal
(Jacquier, Polson, & Rossi, 2002). Owing to these charac-
teristics of high-frequency data that do not conform to
the traditional low-frequency model hypothesis, the fore-
casting error for high frequency data based on low-
frequency financial time series model has gradually
become larger. How to modify the ARIMA model and
migrate it to high-frequency forecasting has a significant
and practical research value. Currently, few studies have
used the deep learning method to correct the ARIMA
model error or to improve the forecasting accuracy of
high-frequency financial data. Therefore, this paper aims
to modify the traditional ARIMA model by using the
machine-derived deep learning long short-term memory
(LSTM) model. Compared with the machine learning
SVM model and other modified models, the deep learn-
ing corrected model not only can reduce the error of the
forecasting model and improve forecasting accuracy, but
it can also reduce model complexity and improve predic-
tive performance. Methodologically, the ARIMA-LSTM
model not only preserves the stability and interpretability
of the traditional ARIMA model, but also absorbs the
long memory of the deep learning model for time series.
Practically, the ARIMA-LSTM model can reduce the
complexity of the deep learning correction process, and
guarantee timeliness for high-frequency financial time
series.
Section 2 introduces relevant models. In Section 3 a
Monte Carlo numerical simulation is constructed to dis-
cover the space that can be improved by the traditional
ARIMA model, and then we combine machine learning
such as SVM and deep learning such as LSTM to correct
the residuals. Finally, empirical high-frequency data of
1082 LI ET AL.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT