Out-of-sample forecasts and nonlinear model selection with an example of the term structure of interest rates.

AuthorLiu, Yamei
  1. Introduction

    Much of the current interest in nonlinear time-series models stems from the large literature documenting the asymmetric behavior of many macroeconomic variables. Neftci (1984), Ashley and Patterson (1989), and Scheinkman and LeBaron (1989) find asymmetric adjustment in U.S. unemployment, production, and employment over the course of the business cycle. Similarly, Tinsley and Krieger (1997) find that negative deviations from trend production are larger than the positive ones and that price levels more readily increase than decrease. Beaudry and Koop (1993) find that negative innovations to gross domestic product (GDP) are much less persistent than positive ones, and Rhee and Rich (1995) find asymmetric effects of monetary shocks on output and expected inflation.

    A number of papers have tried to explicitly model the type of nonlinearity present in the data. For example, Terasvirta and Anderson (1992) use a smooth transition autoregressive model to show that industrial production in 13 OECD countries responds more sharply to negative shocks than to positive shocks. Sichel (1993) and Ramsey and Rothman (1996) examine whether business cycle troughs are more pronounced than peaks and whether contractions are steeper than expansions. Potter (1995) modeled changes in real U.S. gross national product (GNP) as a threshold adjustment process and found that the post-1945 U.S. economy is significantly more stable than the pre-1945 U.S. economy. Shen and Hakes (1995) applied a threshold autoregressive model to the reaction function of the central bank of Taiwan and found that central bank responses depend on the severity of inflation. Ball and Mankiw (1994) presented a menu-cost model with positive trend inflation to show that prices respond more strongly to positive shocks than to negative shocks, and Maravall (1983) used a bilinear model to estimate and forecast the Spanish exchange rate.

    A potential problem with such nonlinear estimates involves the possibility of overfitting. Although the Akaike information criterion (AIC) and Schwartz Bayesian criterion (SBC) were designed to combat the problem of overfitting by adding a penalty term for each estimated parameter, nonlinearity adds another dimension to the problem. (1) A search across functional forms and the various parameterizations of each is likely to yield some nonlinear model that "fits" a particular data set especially well. However, Rothman (1998) shows that the "best" nonlinear specification of the U.S. unemployment rate does not provide the best out-of-sample forecasts.

    The problem is complicated by the fact that tests for nonlinearity are not particularly good at determining the precise form of the nonlinearity. For example, the Lagrange multiplier (LM) tests for nonlinearity reviewed in Granger and Terasvirta (1993) have the null of linearity against an alternative hypothesis specifying a particular type of nonlinear adjustment. It is quite possible that an LM test for threshold adjustment and an LM test for bilinear adjustment are both supported by the same data set. As a result, papers such as McCracken (2000) argue that inference using the out-of-sample characteristics of a model can be superior to in-sample inference.

    The aim of this article is to examine the extent to which out-of-sample forecasts can help select the appropriate nonlinear specification. We focus on a small sample in order to consider a circumstance in which asymptotic methods may not provide useful guidelines concerning model selection. For similar reasons, we emphasize threshold autoregressive (TAR) specifications because they contain an unidentified nuisance parameter under the null hypothesis of linearity. It is shown that standard in-sample estimation procedures typically lead to overfitting in that they select a nonlinear model when the true data-generating process is linear. In contrast, out-of-sample forecasts tend to select the linear specification. When the data are generated from a threshold autoregressive model, the results are somewhat mixed. Some form of threshold process usually has the lowest AIC, SBC, and/or mean square prediction error (MSPE). However, for some parameterizations, a linear model will tend to produce the lowest MSPE.

  2. The Time-Series Models

    Let {[y.sub.t]} be a time-series of interest and suppose that the objective is to forecast the subsequent realization of the series conditional on the current and past observations. In particular, let

    [y.sub.1] = f([y.sub.t]i, [[epsilon].sub.t-j]; i = 1,...p, j = 1,...,q) + [[epsilon].sub.1], (1)

    where [[epsilon].sub.t] is a zero-mean white-noise disturbance.

    The conditional mean of [y.sub.t+1] is given by

    E([y.sub.t+1]\[y.sub.t+1-i], [[epsilon].sub.t+1-j]; i = 1,...,p, j = 1,...,q)

    = f([y.sub.t+1-i], [[epsilon].sub.i+1-j]; i = 1,...,p, j = 1,...,q). (2)

    The econometric problem is that the functional form of f(.) along with the magnitudes of the various parameters need to be estimated from the data. Autoregressive moving-average (ARMA) models, threshold autoregressive (TAR) models, exponential autoregressive (EAR) models, bilinear autoregressive (BL) models, and generalized autoregressive (GAR) models are popular functional forms for modeling economic data. In order to set the stage for our Monte Carlo experiment, we briefly review each of these functional forms.

    Autoregressive Moving Average (ARMA) Models

    The standard ARMA(p, q) model has the form (2)

    [y.sub.t] = [[alpha].sub.0] + [summation over (p/i=1)] [[alpha].sub.i][y.sub.t-i] + [[epsilon].sub.t] + [summation over (q/i=1)] [[beta].sub.i][[epsilon].sub.t-i]. (3)

    ARMA(p, q) models have been extensively analyzed and popularized by Box and Jenkins (1976), where model specification, estimation, and diagnostic checking were analyzed. The main econometric problem is to determine the lag lengths p and q and then estimate the parameters [[alpha].sub.i] and [[beta].sub.i]. If all [[beta].sub.i] = 0, the ARMA model is a pure autoregressive (AR) model of order p. The key point to note is that the ARMA model is linear; all values of [y.sub.t-i] and [[epsilon].sub.t-i] are raised to the power 1 and there are no cross-products of the form of [y.sub.t-i][[epsilon].sub.t-j] or [y.sub.t-i][y.sub.t-j]. (3)

    Threshold Autoregressive (TAR) Models

    The threshold autoregressive models developed by Tong (1983, 1990) allow for a number of different regimes with a separate autoregressive model in each regime. In our Monte Carlo experiments, we focus on the simple two-regime TAR model, (4)

    [y.sub.t] = [I.sub.t] [[alpha].sub.10] + [summation over (p/i=1)] [[alpha].sub.1i][y.sub.t-i]] + (1 - [I.sub.t]) [[[alpha].sub.20] + [summation over (p/i=1)] [[alpha].sub.2i][y.sub.t-i]] + [[epsilon].sub.t], (4)

    where [I.sub.t] is the Heaviside indicator function such that

    [I.sub.t] = {1 if [y.sub.t-1] [greater than or equal to] [tau]

    {0 if [y.sub.t-1] < [tau]. (5)

    In regime 1, [y.sub.t-1] [greater than or equal to] so that [I.sub.t] = 1, (1 - [I.sub.t]) = 0, and [y.sub.t] = [[alpha].sub.10] + [[alpha].sub.11][y.sub.t-1] + ... + [[alpha].sub.1p][y.sub.t-p] + [[epsilon].sub.t]. In regime 2, [y.sub.t-1] < [tau] so that [y.sub.t] = [[alpha].sub.20] + [[alpha].sub.21][y.sub.t-1] + ... + [[alpha].sub.2p][y.sub.t-p] + [[epsilon].sub.t]. Although {[y.sub.t]} is linear in each regime, the possibility of regime switching means that the entire {[y.sub.t]} sequence is nonlinear.

    The momentum threshold autoregressive (M-TAR) model used by Enders and Granger (1998) allows the regime to change according to the first-difference of {[Y.sub.t-1]}. Hence, Equation (5) is replaced with

    [I.sub.t] = {1 if [DELTA][y.sub.t-1] [greater than or equal to] [tau]

    {0 if [DELTA][y.sub.t-1] < [tau]. (6)

    It is argued that the M-TAR model is useful for capturing situations in which the degree of autoregressive decay depends on the direction of change in {[y.sub.t]}. Enders and Granger (1998) and Enders and Siklos (2001) show that interest rate adjustments to the term-structure relationship display M-TAR behavior. It is important to note that, for the TAR and M-TAR models, if all [[alpha].sub.1i] = [[alpha].sub.2i], the TAR and M-TAR models are equivalent to an AR(p) model.

    If [tau] is known, the estimation of the TAR and M-TAR models is straightforward. Simply form the variables [y.sup.+.sub.t-i] = [I.sub.t][y.sub.t-i] and [y.sup.-.sub.t-i] = (1 - [I.sub.t])[y.sub.t-i] and estimate Equation (4) using ordinary least squares (OLS). (5) The lag length p can be determined as in an AR model. When [tau] is unknown, Chan (1993) shows how to obtain a superconsistent estimate of the threshold parameter. For a TAR model, the procedure is to order the observations from smallest to largest such that

    [y.sup.0.sub.1] < [y.sup.0.sub.2] < [y.sup.0.sub.3] ... < [y.sup.0.sub.T]. (7)

    For each value of [y.sup.0.sub.j], let [tau] = [y.sup.0.sub.j], set the Heaviside indicator according to Equation (5), and estimate an equation in the form of (4). The regression equation with the smallest residual sum of squares contains the consistent estimate of the threshold. In practice, the highest and lowest 10% of the {[y.sup.0.sub.j]} values are excluded from the grid search to ensure an adequate number of observations on each side of the threshold. For the M-TAR model, (7) is replaced by the ordered first differences of the observations.

    Exponential Autoregressive (EAR) Models

    EAR models were examined extensively by Ozaki and Oda (1978), Haggan and Ozaki (1981), and Lawrance and Lewis (1980). The form of the EAR model that we use in our Monte Carlo study is

    [y.subl.t] = [[alpha].sub.0] + [summation over (p/i=1)] [[theta].sub.i][y.sub.t-i] + [[epsilon].sub.i], (8)

    where [[theta].sub.i] = [[alpha].sub.i] + [[beta].sub.i] exp(-[gamma][y.sup.2.sub.t-1]) and [gamma] > 0 is the smoothness parameter.

    In the limit as [gamma] [right arrow] 0 or [infinity], the EAR model becomes an AR(p) model because each [[theta].sub.i] is constant. Otherwise...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT