Sick of Data

Date01 January 2018
Published date01 January 2018
© 2018 Wiley Periodicals, Inc.
Published online in Wiley Online Library (
DOI 10.1002/jcaf.22309
Sick of Data
Tim Chartier
Winter is coming. Wor-
ried about catching the
flu? At one time, the
Centers for Disease Control
and Prevention (CDC) was the
hub of information on possible
outbreaks of influenza. Their
warnings came from doctors
reporting new flu cases. When
the CDC’s analysis indicated an
emerging pandemic, the news
was outdated by a week or two.
Today, we can access nearly
real-time information by, as we
often say, Googling it.
Google’s work in predict-
ing flu outbreaks began when
the company examined the 50
million most common search
terms that Americans type.
They also looked at CDC
data between 2003 and 2008.
They searched for correlations
between frequencies in search
queries and the spread of flu
over time and space. A combi-
nation of 45 search terms led
to a strong predictive ability.
Like the CDC, Google could
pinpoint locations of flu out-
breaks. Google’s results were,
however, considerably faster
since their model could begin
to see a pattern before some
search engine users walked
into a doctor’s office or got the
results of a mouth swab. This
epidemiology work by Google
was published in Nature maga-
zine in 2009 claiming they
could “nowcast” the flu. Soon
after, the country entered an
H1N1 crisis. Google’s results
were more accurate than the
CDC and occurred essentially
in real time.
Google has since offered
the tool online, which has
been extended to more than
two dozen other countries.
Simply visit Google Flu
Trends. The site estimates
current outbreak levels in
the United States and can be
refined to examine data at the
state and some city levels.
Google’s ability to essen-
tially don an electronic lab
coat and diagnose H1N1 out-
break levels in 2009 garnered
well-deserved attention. The
2012–2013 flu season sent
Google down a different path,
one of sharply overestimat-
ing the amount of flu. Google
Flu Trends warned that nearly
11% of the population were
infected, when follow-up infor-
mation from the CDC found
no more than 6% had caught
the flu. What went wrong?
Media reports about influenza,
including a state of emergency
declared in New York, led to
more searches that year by
people who weren’t sick.
Google had already antici-
pated how media coverage of
the flu could lead to spikes
in searches by healthy people
for three to seven days. But
this time, flu season actually
was worse than the previous
year, and the searches actu-
ally continued at a higher level
throughout the season. More
media coverage and more flu
meant that even healthy people
did a lot more searching for
“flu” throughout that particu-
lar season. The assumptions
that successfully aided during
the 2009 H1N1 crisis lay at the
core of its failure in 2013.
It’s insightful to focus on
Google response. Seeing the
challenges of their epidemiol-
ogy predictions in 2013, Google
adjusted their algorithm. They
first adjusted for spikes in
searches after media cover-
age. They also refined how
their linear regression throws
out extreme values. With these
changes, their revised method
once again gave an accurate
estimate of flu, within 1% of
what CDC data reported.
The data analytics insight
for this article is less about
Google’s initial success and
much more about its response
to the method’s struggle. Algo-
rithms, like individuals, must be

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT