Benchmark dataset for mid‐price forecasting of limit order book data with machine learning methods

AuthorAlexandros Iosifidis,Moncef Gabbouj,Adamantios Ntakaris,Juho Kanniainen,Martin Magris
Date01 December 2018
DOIhttp://doi.org/10.1002/for.2543
Published date01 December 2018
Received: 15 May 2018 Accepted: 7 July 2018
DOI: 10.1002/for.2543
RESEARCH ARTICLE
Benchmark dataset for mid-price forecasting of limit order
book data with machine learning methods
Adamantios Ntakaris1Martin Magris2Juho Kanniainen2Moncef Gabbouj1
Alexandros Iosifidis3
1Laboratory of Signal Processing, Tampere
University of Technology,Tampere,
Finland
2Laboratory of Industrial and Information
Management, Tampere University of
Technology,Tampere, Finland
3Department of Engineering, Electrical
and Computer Engineering, Aarhus
University, Aarhus, Denmark
Correspondence
Adamantios Ntakaris, Laboratory of
Signal Processing, Tampere University of
Technology,Korkeakoulunkatu 1,
Tampere, Finland.
Email: adamantios.ntakaris@tut.fi
Funding information
H2020 Marie Sklodowska-Curie Actions,
Grant/AwardNumber: MSCA-ITN-ETN
675044
Abstract
Managing the prediction of metrics in high-frequency financial markets is a
challenging task. An efficient way is by monitoring the dynamics of a limit
order book to identify the information edge. This paper describes the first pub-
licly available benchmark dataset of high-frequency limit order markets for
mid-price prediction. We extracted normalized data representations of time
series data for five stocks from the Nasdaq Nordic stock marketfor a time period
of 10 consecutive days, leading to a dataset of 4,000,000 time series samples
in total. A day-based anchored cross-validation experimental protocol is also
provided that can be used as a benchmark for comparing the performance of
state-of-the-art methodologies. Performance of baseline approaches arealso pro-
vided to facilitate experimental comparisons. We expect that such a large-scale
dataset can serve as a testbed for devising novel solutions of expert systems for
high-frequency limit order book data analysis.
KEYWORDS
high-frequency trading, limit order book, mid-price, machine learning, ridge regression, single
hidden feedforward neural network
1INTRODUCTION
Automated trading became a reality when the major-
ity of exchanges adopted it globally. This environment is
ideal for high-frequency traders. High-frequency trading
(HFT) and a centralized matching engine, referred to as
a limit order book (LOB), are the main drivers for gener-
ating big data (Seddon & Currie, 2017). In this paper, we
describe a new order book dataset consisting of approx-
imately 4 million events for 10 consecutive trading days
for five stocks. The data are derived from the ITCH feed
provided by Nasdaq OMX Nordic and consists of the
time-ordered sequences of messages that track and record
all the events occurring in the specific market. It pro-
vides a complete market-wide history of 10 trading days.
Additionally, we define an experimental protocol to eval-
uate the performance of research methods in mid-price
prediction.1
Datasets, like the one presented here, come with chal-
lenges, including the selection of appropriate data trans-
formation, normalization, description, and classification.
This type of massive dataset requires a very good under-
standing of the available information that can be extracted
1Mid-price is the average of the best bid and best ask prices.
...............................................................................................................................................................
This is an open access article under the terms of the Creative Commons AttributionLicense, which permits use, distribution and reproduction in any medium, provided the
original work is properly cited.
© 2018 The Authors Journal of ForecastingPublished by John Wiley & Sons Ltd.
852 wileyonlinelibrary.com/journal/for Journalof Forecasting. 2018;37:852–866.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT