Google's MIDAS Touch: Predicting UK Unemployment with Internet Search Data

DOIhttp://doi.org/10.1002/for.2391
Date01 April 2016
AuthorPaul Smith
Published date01 April 2016
Journal of Forecasting,J. Forecast. 35, 263–284 (2016)
Published online 14 February 2016 in Wiley Online Library (wileyonlinelibrary.com)DOI: 10.1002/for.2391
Google’s MIDAS Touch: Predicting UK Unemployment with
Internet Search Data
PAUL SMITH
ABSTRACT
Internet search data could be a useful source of information for policymakers when formulating decisions based
on their understanding of the current economic environment. This paper builds on earlier literature via a structured
value assessment of the data provided by Google Trends. This is done through two empirical exercises related to the
forecasting of changes in UK unemployment. Firstly, economic intuition provides the basis for search term selection,
with a resulting Google indicator tested alongside survey-based variables in a traditional forecasting environment.
Secondly, this environment is expanded into a pseudo-time nowcasting framework which provides the backdrop for
assessing the timing advantage that Google data have over surveys. The framework is underpinned by a MIDAS
regression which allows, for the first time, the easy incorporation of Internet search data at its true sampling rate into
a nowcast model for predicting unemployment. Copyright © 2016 John Wiley & Sons, Ltd.
KEY WORDS MIDAS; internet search; Google; nowcasting; macroeconomics
INTRODUCTION
In 2013, 36 million or 73% of UK adults accessed the Internet every day, some 20 million more than just 7 years
previously. With this increased penetration has come an associated rise in day-to-day usage for activities such as
finding information about goods and services (up to 66% from 58%), or as a tool to find a new job (in 2013, 67% of
unemployed adults looked online for a job or submitted a job application).1
Many of these users, it would seem, use a search engine, as their portal into the online world. These services
essentially act as an intermediary by bringing together web users via terms entered into a query box. Google Inc.,
which is the dominant force in search engine provisions, processes hundreds of millions of such terms and queries on
a daily basis. Such is the popularity of the eponymous search engine, the term ‘Google’ has now entered the Oxford
Dictionary as a verb (‘to Google’).
The data associated with online search activity offer a number of opportunities for the researcher. If an increased
number of people are, for example, searching online for flat-screen TVs, could this provide an indication that purchas-
ing of such goods will soon rise and offer an early insight into consumer spending activities? Or if there is an increase
in searches for benefits associated with unemployment, could this give an early indication of rises in joblessness?
And if such shifts in behaviour are observed ahead of more traditional sources such as surveys or backward-looking
official data, then timely Internet-based information could be used to make better and more optimal decisions in areas
such as monetary policy or investment.
Investigations of search data have been applied across a number of fields such as predicting changes in tourism
numbers (Bangwayo-Skeete and Skeete, 2015; Yang et al., 2015), offering an advance warning of flu epidemics
(Ginsberg, 2009) or predicting exchange rate volatility (G. Smith, 2012).
In economics, Ettredge et al. (2005) provide one of the first examples of using web search data as a predictor of
macroeconomic statistics, particularly unemployment figures, but it was the release by Google of its freely available
service ‘Google Insights for Search’ in 2008, later to be usurped by ‘Google Trends’, that gifted researchers a readily
available platform to analyse search data.
Google’s own economists—Hyunyoung Choi and Hal Varian (Choi and Varian 2009a,b)—provided early illustra-
tions of how Google Trends data can be used to give an advance indication of US retail and auto sales, new housing
starts, travel destinations and initial claims for unemployment benefits. When comparing one-step-ahead forecasting
errors, adding search data as a regressor in simple seasonal autoregressive and fixed-effects models tends to out-
perform those that exclude this variable. While in some cases the gains are only a few percent, for auto sales the
improvement was substantial at 18% and for new housing starts the gain was 12%. For initial claims data, the gain in
forecasting accuracy was as high as 16%.
Choi and Varian’s papers spawned a number of related studies, with most applications generally based on pre-
dicting some kind of variable that can be linked to the behaviour of households, such as the consumption of goods
Correspondence to: Paul Smith, Department of Economics, University of Strathclyde, Sir William Duncan Building, 130 Rottenrow, Glasgow
G4 0GE, UK. E-mail: paul.smith@markt.com
1Office for National Statistics: Internet Access—Households and Individuals (2013).
Copyright © 2016 John Wiley & Sons, Ltd
264 P. Smith
or activity in the labour and housing markets (although recently Koop and Onorante, 2014, explored the possi-
bility of using Google search data to help improve nowcasts of macroeconomic variables using dynamic model
selection methods).
Askitas and Zimmermann (2009) show how keyword searches correlate strongly with monthly German unem-
ployment data and how a Google predictor can add value to an error prediction model, while Fondeur and Karame
(2013) look at the usefulness of Google data in predicting youth unemployment in France. D’Amuri (2009) assesses
the power of augmenting standard time series models for quarterly Italian unemployment and concludes that the
data improve out-of-sample forecasting performance. McLaren and Shanbhogue (2011) perform similar exercises for
the UK labour and housing markets, comparing simple baseline AR specifications to those augmented with Internet
search variables. The authors go as far as suggesting that Google Trends data may contain information above and
beyond those provided by survey indicators. Consumption-based applications can be found in Kholodilin et al. (2010)
and Schmidt and Vosen (2009).
While there is a general consensus that the data are useful in various short-term forecasting (or ‘nowcasting’)
applications, equally there are a number of challenges and pitfalls to overcome, particularly around the selection of
search terms. Which terms are most relevant to predicting a target variable? What is the motivation of the user to
enter a search term? Lazer (2014) highlight that there needs to be careful consideration of social and independent
searching. Is the user searching for their own purpose, or is the search more akin to some kind of herd behaviour (i.e.
because many others are doing the same)? Such issues have considerable implications for forecasting ability and have
been suggested as a key reason behind the persistent overestimation by Google search data in predicting the number
of flu cases in recent years (Bentley et al., 2014).
In this paper, the aim is to contribute to the debate through an empirical application that primarily considers the
respective abilities of Google and competing survey-based models to forecast changes in unemployment.
The subject has been touched upon by McLaren and Shanbhogue (2011), but several refinements need to be applied
to their approach, plus exploration of other avenues to gain a greater understanding of the role Google search data
can play in macroeconomic forecasting.
Firstly, the selection of search terms—McLaren and Shanbhogue (2011) used a single term: ‘JSA’ (Job Seekers’
Allowance). This approach has a number of flaws. For example,the reliance on a single search term seems a dangerous
strategy, especially as this acronym relates to a specific UK unemployment benefit, the name of which has been
subject to various changes over time. With this in mind, some alternative strategies to term selection are proposed in
the next section, based on economic intuition and a hybrid of ideas found within similar literature.
The quality of Google indicators relative to survey-based variables is analysed in the third section through tra-
ditional linear regression equations that link the target variable (unemployment) with these explanatory variables.
Statistics and tests around model specification, coefficient stability and out-of-sample forecast performances are
conducted.
A larger extension of the literature on the applicability of Google trends data in economics is given in the fourth
and fifth sections.
Google and survey indicators are analysed within a MIDAS regression framework designed to ‘nowcast’
unemployment on a weekly basis over an 8-week ‘nowcasting’ period.
Greater clarity on the purposes of nowcasting is provided later in the fourth section, but for now let nowcasting be
defined simply as an effort to understand what has happened in the very near past, what is happening today or what
is happening in the very near future. Such aims are generally achieved by linking a dependent variable to a dataset
containing various soft (e.g. qualitative survey data) and hard explanatory variables (e.g. quantitative data) via some
kind of econometric forecasting model. All of these indicators are assumed to be useful in predicting the dependent
variable and, as new information on these predictors is released, then econometric models can be updated.
Moreover, Google information is availableto quicker timescales than survey variables. This could be an important
advantage. As shown by P. Smith (2015), the marginal predictive power of an indicator can be linked to its release
schedule, making it an important consideration when trying to understand its role and value to the economic forecaster.
Through the incorporation of MIDAS regressions into a pseudo-time nowcasting framework, there is an opportu-
nity to understand how weekly data and their associated timing advantage over other variables can reduce nowcast
uncertainty through time. MIDAS regressions are used because they offer a neat solution to the problem of mixed
time frequencies that exist within the Google model frameworks: Google data are available weekly, yet unemploy-
ment data are released monthly. As far as is known, this is the first time that MIDAS regressions have been used with
Google data for an economics specific application.
INTERNET SEARCH DATA: TERM SELECTION
It is easy to be persuaded by the theoretical benefits of using internet search data as a potential monitor of economic
behaviour. Compared to traditional survey-based sampling, it is quicker, timelier and cheaper. It is arguably a ‘purer’
form of monitoring behaviour: aforementioned surveys tend to be based on questionnaires and are reliant on the
Copyright © 2016 John Wiley & Sons, Ltd J. Forecast. 35, 263–284 (2016)

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT