Filtering and prediction of noisy and unstable signals: The case of Google Trends data

AuthorLivio Fenga
DOIhttp://doi.org/10.1002/for.2626
Date01 March 2020
Published date01 March 2020
RESEARCH ARTICLE
Filtering and prediction of noisy and unstable signals: The
case of Google Trends data
Livio Fenga
ISTATItalian National Institute of
Statistics, Rome, Italy
Correspondence
Livio Fenga, ISTATItalian National
Institute of Statistics, Via Cesare Balbo 16,
00184 Rome, Italy.
Email: fenga@istat.it
Abstract
Google Trends data is a dataset increasingly employed for many statistical
investigations. However, care should be placed in handling this tool, especially
when applied for quantitative prediction purposes. Being by design Internet
user dependent, estimators based on Google Trends data embody many
sources of uncertainty and instability. They are related, for example, to techni-
cal (e.g., crossregional disparities in the degree of computer alphabetization,
time dependency of Internet users), psychological (e.g., emotionally driven
spikes and other form of data perturbations), linguistic (e.g., noise generated
by doublemeaning words). Despite the stimulating literature available today
on how to use Google Trends data as a forecasting tool, surprisingly, to the best
of the author's knowledge, it appears that to date no articles specifically devoted
to the prediction of these data have been published. In this paper, a novel fore-
casting method, based on a denoiser of the wavelet type employed in conjunc-
tion with a forecasting model of the class SARIMA (seasonal autoregressive
integrated moving average), is presented. The wavelet filter is iteratively cali-
brated according to a bounded search algorithm, until a minimum of a suitable
loss function is reached. Finally, empirical evidence is presented to support the
validity of the proposed method.
KEYWORDS
denoising, forecast, google Trends, SARIMA models, unstable time series, wavelet theory
1|INTRODUCTION
Though they may be built on the basis of reliable and
wellmaintained archives, there is no doubt that none of
the available reallife time series is ever error free, espe-
cially when it comes to the archives that the statistical
units are extracted from. Consistency checks and
updating procedures are routinely performed by statistical
institutions (e.g., national statistical offices and central
banks) in the attempt to control for bias and other sys-
tematic and nonsystematic errors, which can adversely
affect the quality of the information disseminated. Thank-
fully, a large number of ad hoc statistical tools are
generally available to statistical providers as well as the
technology to use them, both in the form of affordable
hardware resources and freely available computer
routines.
However, there are cases where statistical investiga-
tions are designed to extract valid information on the
basis of unconventional archives,for which concepts
such as construction and maintenance are neither appli-
cable nor conceivable. This is the case of the Google
Trends (GT) dataset which, instead of being designed
to meet the traditional inference requirements of the
sampletopopulation type, is indeed aimed at capturing
the system status,the population of reference is at
Received: 31 October 2017 Revised: 8 July 2019 Accepted: 24 July 2019
DOI: 10.1002/for.2626
Journal of Forecasting. 2020;39:281295. © 2019 John Wiley & Sons, Ltd.wileyonlinelibrary.com/journal/for 281

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT