Forecasting with unbalanced panel data

Published date01 August 2020
AuthorBadi H. Baltagi,Long Liu
Date01 August 2020
DOIhttp://doi.org/10.1002/for.2646
Received: 25 April 2019 Revised: 20 December 2019 Accepted: 21 December 2019
DOI: 10.1002/for.2646
RESEARCH ARTICLE
Forecasting with unbalanced panel data
Badi H. Baltagi1Long Liu2
1Department of Economics and Center for
Policy Research, Syracuse University,
Syracuse, New York
2Department of Economics, College of
Business, University of Texas at San
Antonio, San Antonio, Texas
Correspondence
Badi H. Baltagi, Department of Economics
and Center for Policy Research, 426
Eggers Hall, Syracuse University,
Syracuse, NY 13244-1020.
Email: bbaltagi@maxwell.syr.edu
Abstract
This paper derives the best linear unbiased prediction (BLUP) for an unbalanced
panel data model. Starting with a simple error component regression model with
unbalanced panel data and random effects, it generalizes the BLUP derived by
Taub (Journal of Econometrics, 1979, 10, 103–108) to unbalanced panels. Next
it derives the BLUP for an unequally spaced panel data model with serial cor-
relation of the AR(1) type in the remainder disturbances considered by Baltagi
and Wu (Econometric Theory, 1999, 15, 814–823). This in turn extends the BLUP
for a panel data model with AR(1) type remainder disturbances derived by Bal-
tagi and Li (Journal of Forecasting, 1992, 11, 561–567) from the balanced to the
unequally spaced panel data case. The derivations are easily implemented and
reduce to tractable expressions using an extension of the Fuller and Battese
(Journal of Econometrics, 1974, 2, 67–78) transformation from the balanced to
the unbalanced panel data case.
KEYWORDS
forecasting, BLUP, unbalanced panel data, unequally spaced panels, serial correlation
1INTRODUCTION
Panel data are usually unbalanced or unequally spaced due
to lack of observations on households not interviewed in
certain years or firms not filing their data survey forms
for a particular period. Even daily stock price data have
no observations when the market is closed due to holi-
days or weekends. The unequally spaced pattern is also
useful for repeated sales of houses that are not sold each
year but at irregularly spaced intervals. It is also a common
problem for longitudinal surveys and household surveys
in developed as well as developing countries; see examples
of these in table 1 of McKenzie (2001) as well as table 1 of
Millimet and McDonough (2017). Unbalanced panel data
estimation and testing have been studied in economet-
rics; see chapter 9 of Baltagi (2013a) and references cited
therein. This paper focuses on forecasting with unbal-
anced panel data. In particular, the paper starts by extend-
ing the best linear unbiased predictor (BLUP) derived by
Taub(1979) for the random effects error component model
from balanced to unbalanced panel data models. Next,
the BLUP for the unequally spaced panel data with serial
correlation of the AR(1) type in the remainder distur-
bances, considered by Baltagi and Wu (1999), is derived.
This extends the BLUP for the random effects model with
serial correlation of the AR(1) type derived by Baltagi
and Li (1992) from balanced panels to unequally spaced
panels. Unbalanced panel data can be messy. This paper
keeps the derivations simple and easily tractable, using the
Fuller and Battese (1974) transformation extended from
the balanced to the unbalanced panel data case.
2THE BEST LINEAR UNBIASED
PREDICTOR
Consider an unbalanced panel data regression model:
it =X
it+uit ,(1)
for i=1,,N;t=1,Ti.Theisubscript denotes, say,
individuals in the cross-section dimension and tdenotes
Journal of Forecasting. 2020;39:709–724. wileyonlinelibrary.com/journal/for © 2019 John Wiley & Sons, Ltd. 709
BALTAGIAND LIU
years in the time series dimension. The panel data are
unbalanced since there are Nunique individuals and indi-
vidual iis only observed over Titime periods.1The regres-
sor Xit is a K×1 vector of the explanatory variables and is
aK×1 vector of coefficients. In an earnings equation in
economics, for example, it is log wage for the ith worker
in the tth time period. Xit may contain a set of variables
such as age, experience, tenure, and whether the worker
is male, black, etc. In most of the panel data applications,
the disturbances follow a simple one-way errorcomponent
model with
uit =i+vit,(2)
where idenotes the unobservable time-invariant individ-
ual specific effect, such as ability.vit denotes the remainder
disturbance that varies with individuals and time (see ;
Baltagi, 2013a). Let n=N
i=1Ti. In vector notation,
Equations (1) and (2) can be written as
=X+u(3)
and
u=Z+v,(4)
where =(11,,
1T1,
21,,
2T2,,
N1,,
NTN)
is an n×1 vector of observations stacked such that the
slower index is over individuals and the faster index is
over time.2Other vectors or matrices including X,uand
vare similarly defined. =(1,,
N)is an N×1
vector. The selector matrix Z=diagTiis a matrix of
ones and zeros, where Tiis a vector of ones of dimen-
sion Ti. It is simply the matrix of individual dummies
that one may include in the regression to estimate the
iif they are assumed to be fixed parameters. Define
P=Z(Z
Z)1Z
, which is the projection matrix on Z.
In this case, ZZ
=diag JTi,whereJTiis a matrix of
ones of dimension Ti.Let
JTi=JTiTi. Hence Preduces
to diag
JTi, which averages the observation across time
for each individual over their Tiobservations. Similarly,
Q=INT Pis a matrix that obtains the deviations
from individual means. For example, if we regress on
the matrix of dummy variables Z, the predicted values
Phave a typical element i.=Ti
t=1itTirepeated Ti
1The data are assumed to be missing at random. This in turn allows the
missingness of the data scheme to be ignorable, in the language of Little
and Rubin (2002).
2This pattern of unbalancedness does not have to be from 1,2,,Ti.In
fact, these Tiobservationscan be for any subset of the observed time series
period. This pattern is used to make the derivation easy and tractable
and follow similar derivations for the balanced case. A more general pat-
tern of unbalancedness can be used. In fact, Section 2 extends this to the
unequally spaced panel data with serial correlation across time consid-
ered by Baltagi and Wu (1999). A two-way error component model with
a general type of missing data is considered in Wansbeek and Kapteyn
(1989).
times for each individual. Qgives the residuals of this
regression with typical element it i..
For the random effects model, ii.i.d. (0,2
),vit
i.i.d. (0,2
)and the iare independent of the vit and Xit for
all iand t. The variance–covariance matrix of the distur-
bances is given by
Ω=E(uu)=2
diag JTi+2
vdiag ITi
=diag 2
i
JTi+2
ETi,(5)
where 2
i=Ti2
+2
,andETi=ITi
JTi. Using the fact
that
JTiand ETiare idempotent matrices that sum to the
identity matrix ITi,itiseasytoverifythat
Ω1=diag 1
2
i
JTi+1
2
ETi(6)
and
Ω12=diag 1
i
JTi+1
ETi(7)
(see; Wansbeek & Kapteyn, 1982). Now a generalized
least squares (GLS) estimator can be obtained as a
weighted least squares following Fuller and Battese (1974).
In this case one premultiplies the regression model
in Equation (3) by Ω12=diag
i
JTi+ETi=
diag ITii
JTi,wherei=1−(i). GLS becomes
ordinary least squares (OLS) on the resulting transformed
regression of on Xwith =Ω12having a typi-
cal element
it =it ii.,andX=Ω12Xdefined
similarly.
For the ith individual, we want to predict Speriods
ahead. As derived by Goldberger (1962), the best linear
unbiased predictor (BLUP) of i,Ti+Sfor the GLS model is
i,Ti+S=X
i,Ti+S
GLS +w1
uGLS,(8)
for S1, where
GLS is the GLS estimator of from
Equation (3), w=E(ui,T+Su),Ωis the variance–covariance
structure of the disturbances, and
uGLS =X
GLS.Note
that we have ui,Ti+S=i+i,Ti+Sfor period Ti+Sand hence
w=2
(0,,
Ti,0,,0).Inthiscase
wΩ1=2
(0,,
Ti,0,,0)diag 1
2
i
JTi+1
2
ETi
=2
2
i
(0,,
Ti,0,,0)
(9)
since
Ti
JTi=
Tiand
TiETi=0. The last term of BLUP
becomes
w1
uGLS =Ti2
2
i
ui.,GLS,(10)
where
ui.,GLS =T1
iTi
t=1
uit,GLS. Therefore , the BLUP
for i,T+Scorrects the GLS prediction by a fraction of
the mean of the GLS residuals corresponding to that ith
710

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT