A Time‐Simultaneous Prediction Box for a Multivariate Time Series

DOIhttp://doi.org/10.1002/for.2366
Published date01 December 2015
Date01 December 2015
Journal of Forecasting,J. Forecast. 34, 675–693 (2015)
Published online 30 September 2015 in Wiley Online Library (wileyonlinelibrary.com)DOI: 10.1002/for.2366
A Time-Simultaneous Prediction Box for a Multivariate
Time Series
DAG KOLSRUD
Statistics Norway, Oslo, Norway
ABSTRACT
A sample-based method in Kolsrud (Journal of Forecasting 2007; 26(3): 171–188) for the construction of a
time-simultaneous prediction band for a univariate time series isextended to produce a variable- and time-simultaneous
prediction box for a multivariate time series. A measure of distance based on the L1-norm is applied to a learning
sample of multivariate time trajectories, which can be mean- and/or variance-nonstationary. Based on the ranking of
distances to the centre of the sample, a subsample of the most central multivariate trajectories is selected. A prediction
box is constructed by circumscribing the subsample with a hyperrectangle. The fraction of central trajectories selected
into the subsample can be calibrated by bootstrap such that the expected coverage of the box equals a prescribed
nominal level. The method is related to the concept of data depth, and thence modified to increase coverage.
Applications to simulated and empirical data illustrate the method, which is also compared to several other methods
in the literature adapted to the multivariate setting. Copyright © 2015 John Wiley & Sons, Ltd.
KEY WORDS bootstrap; coverage error; data depth; learning sample; multivariate prediction region;
simultaneous prediction bands
INTRODUCTION
Monetary policy in many central banks aims at controlling inflation by setting the interest rate while keeping an eye
on other variables that reflect resource utilization and pressure in the economy. A central bank economist might be
interested in the unemployment rate and/or the output (gap) in addition to the interest rate and inflation. Predictions
of future values for these and possibly other relevant variables are useful in policy analysis before making decisions
about interest rate setting. But predictions are uncertain. One way to quantify uncertainty in a point prediction is to
construct a prediction interval with a prescribed probability of including the unknown future value. The fan charts
of Bank of England’s projections are good examples of prediction intervals (see Bank of England 2015; Clements,
2004; Casillas-Olvera and Bessler, 2006).
A sequence of intervals, depicted by the fan charts, quantify uncertainty in a single variable and at several single
points in time. Many important economic variables are dynamic and interdependent or correlated. The intervals do
not reflect uncertainty due to dependencies or correlations through time and between variables. A more comprehen-
sive measure of uncertainty in the predictions of such variables would be a prediction region that reflects uncertainty
in all variables of interest and through the whole time span simultaneously. How can such a region be constructed?
The present paper answers that question by proposing a method of constructing a prediction ‘box’ for a multivariate
time series (seen as a high-dimensional entity). The method is nonparametric, computationally intensive but feasi-
ble, conceptually straightforward, and relatively general and widely applicable. The method should be of interest to
institutional and individual practitioners.
There is a large and wide literature on confidence and prediction intervals, a small and fragmented literature on
simultaneous or multivariate confidence or prediction regions, but hardly any literature on simultaneous and multi-
variate prediction regions. The present paper makes a contribution to the latter category. A dispersed literature on
interval and region estimation focuses on simple parametric models of low dimensions; see, for example, Vardeman
(1992) for a discussion of types of statistical intervals, and Chatfield (1993) for a review of interval forecasts. Chew
(1968) and Hymans (1968) are early examples of univariate simultaneous intervals in a normal distribution. Para-
metric approaches tend to construct regions that are spherical in shape, like the ellipse for two binormal parameters.
Their intervals are projections of (a downscaled version of) the hyper-parallelepiped that circumscribes the ellipsoidal
confidence region—with unknown increase in coverage relative to the ellipsoid. Most articles deal with intervals
and non-rectangular confidence regions for model parameters (see Shih, 1988; McCulloch et al., 1996; Abdelkhalek
and Dufour, 1998; Yang and Kolassa, 2004). Some exceptions are simultaneous bands for spectral densities; see, for
example, Neumann and Paparoditis (2008); for survival functions see, for example, McKeague and Zhao (2006); for
gait curves see Olshen et al. (1989) and Lenhoff et al. (1999); for unknown curves like probability functions see
Yeh (1996), Xu et al. (2009) and Hong et al. (2010). In dynamic models, even simple linear time series models,
Correspondence to: Dag Kolsrud, Statistics Norway,PO Box 8131 Dep., N-0033 Oslo, Norway. E-mail: dok@ssb.no
Copyright © 2015 John Wiley & Sons, Ltd
676 D. Kolsrud
it is analytically very difficult or practically impossible to construct a time-simultaneous prediction region with a
prescribed coverage level. Certain analytical but restricted approaches are found in Ravishanker et al. (1987, 1991),
Ravishanker and Nolan (2009) and Alpuim (1997). Chan et al. (1998) compare exact and approximative methods.
Jordà (2009) and Jordà and Marcellino (2010) construct univariate Scheffe bands for Gaussian impulse responses and
path forecasts, respectively. Staszewska-Bystrova (2011) and Staszewska-Bystrova and Winker (2013) use bootstrap
and heuristic methods to improve simultaneous coverage of a univariate prediction band. Simultaneous confidence
or prediction bands with controllable coverage can easily be constructed nonparametrically or numerically, as
Kolsrud (2007) shows for a univariatetime series. The present paper generalizes one of the methods in Kolsrud (2007)
to the multivariate time series. Many multivariate and probabilistic methods that might conceivably be adapted to the
task suffer the curse of dimensionality and are currently not computationally feasible (see, for example, Barnett, 1976;
Liu et al., 1999, 2006).
I propose a practical method that uses only a dataset and no parametric description of the time series. My starting
point is a learning sample of replicated and identically distributed multivariate time trajectories, which may be both
trending and heteroscedastic. The learning sample might be simulated in a stochastic dynamic model, for instance
a dynamic macroeconometric simultaneous-equations model, bootstrapped in a vector autoregressive model (VAR),
or it might be a given empirical (panel) dataset. The details of how the learning sample was simulated/bootstrapped
or sampled/selected are irrelevant to the method and beyond the scope of the present paper. Based solely on a ‘geo-
metric’ interpretation of the information in the learning sample, I construct the prediction region without making
any parametric assumptions about the statistical distribution of the sample, the simulation model or the real-world
data-generating process. The method is conceptually simple: the trajectories can be ordered by their centrality in the
bundle of sample trajectories. The centrality of each trajectory, or its depth in the data sample, is measured by its ‘dis-
tance’ from the sample centre; see Barnett (1976) for an early introduction to ordering of high-dimensional data or
Liu et al. (1999) for a general introduction to the wider notion of data depth. A certain fraction of the most central
trajectories is selected and circumscribed by a high-dimensional ‘box’. The particular measure of centrality that I
propose, a normalized L1-norm, makes any fraction of most central trajectories have a more or less hyperrectangu-
lar shape (‘box’). Then the coverage probability of the circumscribing hyperrectangular box is approximately equal
to the fraction itself. That makes the coverage controllable. This is important, and in contrast to other methods. An
example of such a prediction box can be seen in Figure 1.
The box or hyperrectangular shape of the prediction region (rather than a tubular or hyperellipsoidal shape) is
motivated by ease of presentation and interpretation. The hyperrectangle allows coverage-preserving orthogonal
projections onto each time-variable plane, and renders possible independent quantification, interpretation and visu-
alization of uncertainty for each variable at all time points. That is highly desirable when the number of variables
exceeds two. For the bivariate trajectory in Figure 1, the projection of the prediction box onto the plane of each vari-
able is a variable- and time-simultaneous prediction band for each single variable. The right-hand panel in Figure 1
shows the two projections. Each band is a wall in the box.
In the literature on confidence regions and simultaneous confidence intervals for several model parameters, the
simultaneous view is motivated by the parameter estimates being correlated. Correlation affects the size and/or shape
of a prediction region. A similar motivation applies to a prediction region for a single autocorrelated variable (a time
Figure 1. Bivariate time-simultaneous predictions of the female (F) and male (M) population in Norway from 2000 to 2050,
measured in millions of individuals. The left-hand panel shows 10 bivariate time trajectories simulated in a dynamic population
model with stochastic parameters of fertility, mortality and migration (cf. Keilman et al., 2002). The central panel shows nine of
the 10 trajectories entirely inside a prediction box with coverage probability 0.9 for the whole of a bivariate trajectory randomly
selected from the simulated sample of size 1000. One of the 10 trajectories is partly outside the prediction box, which is defined
by four bounding trajectories (bold). The right-hand panel shows projections of the mean trajectory (bold) and the prediction
box (thin) onto the planes of the female and the male population. The ‘bars’ inside the projection bands are univariate and time-
pointwise percentile 0.05–0.95 prediction intervals, each covering 900 univariate sample points. A later section (‘A multivariate
prediction box’) and Figure 2 provide information on the construction of the box. Figure 3 and Example1 in the ‘Examples’
section provide more details
Copyright © 2015 John Wiley & Sons, Ltd J. Forecast. 34, 675–693 (2015)

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT