Filtering and prediction of noisy and unstable signals: The case of Google Trends data
Author | Livio Fenga |
DOI | http://doi.org/10.1002/for.2626 |
Date | 01 March 2020 |
Published date | 01 March 2020 |
RESEARCH ARTICLE
Filtering and prediction of noisy and unstable signals: The
case of Google Trends data
Livio Fenga
ISTAT—Italian National Institute of
Statistics, Rome, Italy
Correspondence
Livio Fenga, ISTAT—Italian National
Institute of Statistics, Via Cesare Balbo 16,
00184 Rome, Italy.
Email: fenga@istat.it
Abstract
Google Trends data is a dataset increasingly employed for many statistical
investigations. However, care should be placed in handling this tool, especially
when applied for quantitative prediction purposes. Being by design Internet
user dependent, estimators based on Google Trends data embody many
sources of uncertainty and instability. They are related, for example, to techni-
cal (e.g., cross‐regional disparities in the degree of computer alphabetization,
time dependency of Internet users), psychological (e.g., emotionally driven
spikes and other form of data perturbations), linguistic (e.g., noise generated
by double‐meaning words). Despite the stimulating literature available today
on how to use Google Trends data as a forecasting tool, surprisingly, to the best
of the author's knowledge, it appears that to date no articles specifically devoted
to the prediction of these data have been published. In this paper, a novel fore-
casting method, based on a denoiser of the wavelet type employed in conjunc-
tion with a forecasting model of the class SARIMA (seasonal autoregressive
integrated moving average), is presented. The wavelet filter is iteratively cali-
brated according to a bounded search algorithm, until a minimum of a suitable
loss function is reached. Finally, empirical evidence is presented to support the
validity of the proposed method.
KEYWORDS
denoising, forecast, google Trends, SARIMA models, unstable time series, wavelet theory
1|INTRODUCTION
Though they may be built on the basis of reliable and
well‐maintained archives, there is no doubt that none of
the available real‐life time series is ever error free, espe-
cially when it comes to the archives that the statistical
units are extracted from. Consistency checks and
updating procedures are routinely performed by statistical
institutions (e.g., national statistical offices and central
banks) in the attempt to control for bias and other sys-
tematic and nonsystematic errors, which can adversely
affect the quality of the information disseminated. Thank-
fully, a large number of ad hoc statistical tools are
generally available to statistical providers as well as the
technology to use them, both in the form of affordable
hardware resources and freely available computer
routines.
However, there are cases where statistical investiga-
tions are designed to extract valid information on the
basis of unconventional “archives,”for which concepts
such as construction and maintenance are neither appli-
cable nor conceivable. This is the case of the Google
Trends (GT) dataset which, instead of being designed
to meet the traditional inference requirements of the
sample‐to‐population type, is indeed aimed at capturing
the “system status,”the population of reference is at—
Received: 31 October 2017 Revised: 8 July 2019 Accepted: 24 July 2019
DOI: 10.1002/for.2626
Journal of Forecasting. 2020;39:281–295. © 2019 John Wiley & Sons, Ltd.wileyonlinelibrary.com/journal/for 281
To continue reading
Request your trial