Direct multiperiod forecasting for algorithmic trading

DOIhttp://doi.org/10.1002/for.2488
Date01 January 2018
AuthorHiroyuki Kawakatsu
Published date01 January 2018
Received: 3 July 2016 Revised: 8 July 2017 Accepted: 9 August 2017
DOI: 10.1002/for.2488
RESEARCH ARTICLE
Direct multiperiod forecasting for algorithmic trading
Hiroyuki Kawakatsu
Business School, Dublin City University,
Dublin 9, Ireland
Correspondence
Hiroyuki Kawakatsu, Business School,
Dublin City University, Dublin 9, Ireland.
Email: hiroyuki.kawakatsu@dcu.ie
Abstract
This paper examines the performance of iterated and direct forecasts for the
number of shares traded in high-frequency intraday data. Constructing direct
forecasts in the context of formulating volume weighted average price trading
strategies requires the generation of a sequence of multistep-ahead forecasts.
I discuss nonlinear transformations to ensure nonnegative forecasts and lag
length selection for generating a sequence of direct forecasts. In contrast to
the literature based on low-frequency macroeconomic data, I find that direct
multiperiod forecasts can outperform iterated forecasts when the conditioning
information set is dynamically updated in real time.
KEYWORDS
direct multistep forecasting, intraday forecasting, lag length selection, volume weight averageprice
1INTRODUCTION
There is a large and active literature on time series anal-
ysis of high-frequency data from financial markets. While
several types of financial time series have been the sub-
ject of analysis, this paper contributes to the literature
on modeling and forecasting of intraday transaction vol-
ume (Białkowski, Darolles, & Le Fol, 2008; Brownlees,
Cipollini, & Gallo, 2011; Humphery-Jenner, 2011). The
interest in analyzing transaction volume data arises from
its application to computing a benchmark price, known
as the volume weighted average price (VWAP), commonly
used in the industry to measure the market price impact of
financial transactions. VWAPappears to be formally intro-
duced in the academic literature by Berkowitz, Logue, and
Noser (1988).
To define VWAP, denote the number of sharestraded at
transaction ion trading day tby xit and by pit the trans-
action price per share where i=1,,ntand ntis the
number of transactions on trading day t=1,,T.The
VWAP vtfor tradingday tis defined as
vt=
nt
i=1
witpit ,wit xi,t
nt
j=1xj,t
.
Rather than work with raw transactions data, the exist-
ing literature aggregatesthe raw data into bins of fixed time
intervals, for example 15 minutes, perhaps to smooth out
various microstructure noises. Divide each tradingday into
b=1,,Bbins, each of equal (calendar) time length and
define the bin aggregate quantities
ab,t
ib
xit,vb,t1
ab,t
ib
xitpit ,
wb,tab,t
B
b=1ab,t
=ab,t
nt
i=1xit
=wit
xit
ab,t,
where ab,tare the shares traded in the bth bin and vb,tis the
VWAP within the bth bin. The VWAP vtfor trading day t
can then be computed from binned data as
vt=
B
b=1
ib
witpit =
B
b=1
ib
wb,t
ab,t
xitpit
=
B
b=1wb,t
ab,t
ib
xitpit =
B
b=1
wb,tvb,t.
A key component for formulating a VWAP trading strategy
is to obtain a good forecast for the volume weights wb,t.In
particular,the trader does not need to predict the sequence
Journal of Forecasting. 2018;37:83–101. wileyonlinelibrary.com/journal/for Copyright © 2017 John Wiley & Sons, Ltd. 83
84 KAWAKATSU
of prices pit (Białkowski et al., 2008). The volume weights
wb,tin turn depend on the trading volume ab,tin each bin
b=1,,Bover the trading day t. As the denominator
in wb,tis the sum of trading volume over the trading day,
computation of wb,tfor bin brequires knowledge of trading
volume ab,tfor all bins b=1,,Bon trading day t.Asa
forecasting problem, a VWAPstrategy therefore requires a
sequence of forecasts for different step sizes over the trad-
ing day.Białkowski et al. (2008) and Brownlees et al. (2011)
generate these forecasts from a one-step-ahead model by
recursively iterating on the previous ahead forecasts. This
method of obtaining multistep forecasts by recursively iter-
ating on one-step forecasts is also known as the plug-in
method.
This paper considers an alternative direct multiperiod
forecasting approach where forecasts for different step
sizes are generated from different models each specific
to that step size. The iterated and direct multiperiod
approaches to forecasting have been compared both the-
oretically and empirically. The main theoretical result
for autoregressive processes, summarized in Marcellino,
Stock, and Watson (2006), depends on the “true” lag
order p0and the lag order pmassumed in the forecast-
ing model. If p0>pm, the asymptotic mean squared
error criterion (ignoring estimation uncertainty) favors
the direct method over the iterated method. For p0
pm, the asymptotic mean squared error is the same but
the iterated method has less estimation uncertainty than
that from the direct method. These theoretical results,
while informative, require knowledge of p0and are asymp-
totic in nature. The empirical finite-sample comparison
of iterated and direct forecasts has been conducted pri-
marily using low-frequency macroeconomic time series.
Marcellino et al. (2006) conduct a large-scale empiri-
cal comparison of iterated and direct forecasts of 170
low-frequency (monthly) US macroeconomic time series
for the period 1959–2002. Their main finding based on
pseudo out-of-sample forecasts from recursive estimation
is that iterated forecasts tend to havesmaller mean squared
forecast errors than direct forecasts but the improvement
is modest. They also find that the performance of direct
forecasts deteriorates with the forecast step size (up to
24-month-ahead forecasts are considered).
The main contribution of this paper is to examine
the performance of iterated and direct forecasts for
high-frequency intraday time series data. In contrastto the
results in Marcellino et al. (2006) based on low-frequency
data, I find that direct multiperiod forecasts can outper-
form iterated forecasts when the conditioning information
set is dynamically updated in real time. The comparison
of iterated and direct forecasts in the context of predict-
ing intraday trading volume ab,tfor VWAP trading strategy
raises some new issues not considered in the existing liter-
ature. To formulate a trading strategy for the next trading
day, one requires a prediction of trading volume ab,tfor
all bins on that day. Therefore, in contrast to the exist-
ing literature, which compares performance for a fixed
step size or forecast horizon, we need to examine per-
formance for a sequence of step sizes. Jordà (2005) also
considers a sequence of direct forecasts in the context of
tracing impulse responses over time. Following Brownlees
et al. (2011) I consider B=26 bins of 15-minute inter-
vals for each trading day. For direct multiperiod forecasts,
this requires estimation of up to B=26 equations for
step sizes h=1,,B. The cost of estimating a non-
linear model such as the self-extracting threshold autore-
gression of Białkowski et al. (2008) and the component
multiplicative error model of Brownlees et al. (2011) B
times becomes high, especially when we need to repeat this
over a recursive or rolling estimation sample. For this rea-
son the forecasts in this paper are generated from a linear
autoregressive model using a rolling estimation sample.
The model specification and generation of forecasts are
described in detail in Section 2.
Although the focus is on the linear model for which
most of the theoretical results are available,there are some
important issues that need to be addressed when applied to
intraday tradingvolume data. The number of shares traded
are discrete and integer valued.Rather than model number
of shares traded, the empirical analysis follows Brown-
lees et al. (2011) and models volume scaled (divided) by
the number of daily shares outstanding. This scaling can
reduce the effect of abrupt changes in volume due to occa-
sional stock splits in the sample. The scaled volume series
can be treated as continuous and avoid the issue of mod-
eling discreteness in the raw volume series. Another issue
that does not appear to have received much attention in
the literature is that traded volume cannot be negative.
A potentially misspecified linear model may produce a
negative-valued forecast in the pseudo out-of-sample fore-
cast sample. To ensure this nonnegativity condition, I fit
the linear autoregressive model to the (natural) log of trad-
ing volume yb,tlog(ab,t). To obtain a forecast for the
variable of interest wb,t, we need to transform the fore-
cast for yb,tinto a forecast for the untransformed variable
ab,t. Because the transformation is nonlinear, the naive
(un)transformation exp(̂
yb,t)does not generally yield an
unbiased prediction of ab,teven if ̂
yb,twere an unbiased
prediction of yb,t. This issue of nonlinear transformation is
discussedinSection2.2.AsshowninSection3,another
important benefit of the log transformation is that the dis-
tribution of yb,tis closer to the Gaussian than that of the
untransformed series ab,t. In particular, the kurtosis of yb,t
is closer to the Gaussian than that of ab,tand the least
squares estimates are less likely to be affected by extreme
values in the data.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT