Gaussian processes for daily demand prediction in tourism planning

DOIhttp://doi.org/10.1002/for.2644
Published date01 April 2020
AuthorDries F. Benoit,Wai Kit Tsang
Date01 April 2020
Received: 25 March 2019 Revised: 20 September 2019 Accepted: 2 December 2019
DOI: 10.1002/for.2644
RESEARCH ARTICLE
Gaussian processes for daily demand prediction in tourism
planning
Wai Kit Tsang Dries F. Benoit
Faculty of Economics and Business
Administration, Ghent University,Ghent,
Belgium
Correspondence
Wai Kit Tsang,Faculty of Economics
and Business Administration, Ghent
University, Tweekerkenstraat 2, 9000
Ghent, Belgium.
Email: waikit.tsang@ugent.be
Funding information
Bijzonder Onderzoeksfonds
Abstract
This study proposes Gaussian processes to forecast daily hotel occupancy at
a city level. Unlike other studies in the tourism demand prediction literature,
the hotel occupancy rate is predicted on a daily basis and 45 days ahead of
time using online hotel room price data. A predictive framework is introduced
that highlights feature extraction and selection of the independent variables.
This approach shows that the dependence on internal hotel occupancy data can
be removed by making use of a proxy measure for hotel occupancy rate at a
city level. Six forecasting methods are investigated, including linear regression,
autoregressive integrated moving average and recent machine learning meth-
ods. The results indicate that Gaussian processes offer the best tradeoff between
accuracy and interpretation by providing prediction intervals in addition to
point forecasts. It is shown how the proposed framework improves managerial
decision making in tourism planning.
KEYWORDS
Gaussian processes, tourism demand forecast, feature extraction, daily data, prediction interval
1INTRODUCTION
Due to the perishable nature of many tourismproducts and
services, accurately forecasting demand is important for
the tourism industry (Archer et al., 1987; Wandner et al.,
1980). Hotel managers assign employees and plan work-
ing schedules depending on the expected number of hotel
bookings. Tour guides and other tourism service providers
plan operations in anticipation of the expected demand.
Accurate tourism demand prediction is crucial for air-
line companies and car rental services to set their prices
optimally (Talluri & VanRyzin, 2006).
Previous tourism demand studies primarily use data
from surveys or governmental statistics, which have a
low temporal frequency (Peng et al., 2014; Song et al.,
2010). Due to the significant time lag between registration
and publication of the data, the practical deployability of
models relying on such data is limited. In addition, such
statistics, often gathered offline, are highly aggregated,
which smooths out irregularities in the temporal pattern.
Tourism demand forecasting studies often use autoregres-
sive integrated moving average (ARIMA) and seasonal
ARIMA (SARIMA) models when there are no significant
structural breaks in the time series (Chu, 2008; Goh & Law,
2002).
For yearly and quarterly data a variety of tourism
demand models are used in the literature. Time series
models, such as the Black–Scholes–Merton (BSM) model
by Greenidge (2001), are used to extract cyclical and
seasonal patterns (Wooldridge, 2015). Static econometric
models such as the gravity model, used by Wei (2007)
to predict inbound tourism demand to China, are ideal
for analyzing elasticities and explanatory variables, but
do not take short-run dynamics into consideration (Song
et al., 2008). Dynamic econometric models, such as the
time varying parameter (TVP) model, perform well in
1-year-ahead forecasting (Witt et al., 2003).
This study contributes to the existing literature by
modeling online hotel room price data to predict not
monthly or yearly, but daily hotel occupancy rates at a
wileyonlinelibrary.com/journal/for ©2019 John Wiley & Sons, Ltd.Journal of Forecasting. 2020;39:551–568. 551
TSANG AND BENOIT
city level using Gaussian processes. Hotel room price data
are publicly available on websites such as agoda.com,
expedia.com, trip.com and booking.com, the latter web-
site being the data source from which the hotel room price
data are scraped for this study. We show how this public
data can allow governments and tourism businesses a solid
approach to estimate tourism demand. In the remainder
of the study we refer to these data as online data. Instead
of using the true hotel occupancy rates, a proxy measure
for hotel occupancy rate at a city level is provided by a
Belgian hotel company in the field of online travel agency
that relied on public hotel room price data that are vali-
dated against internal hotel occupancy data from 576 hotel
accommodations in Brussels. This approach removes the
dependence on internal hotel occupancy data, which is
proprietary and not necessarily available.
In addition, we also show in this study how feature pre-
processing, feature extraction, and selection methods sig-
nificantly boost the predictive performance. This has been
overlooked by existing approaches in the demand predic-
tion literature. Gaussian processes are used to predict the
hotel occupancy rate at a city level and are benchmarked
against linear regression, ARIMA and machine learning
models. Gaussian processes produce prediction intervals,
a measure to quantify uncertainty that policymakers can
use for reliable decision making. Note that, as outlined
in Section 4, these prediction intervals are more flexi-
ble and realistic than similar intervals produced by other
approaches.
In Section 2 the literature is discussed. In Section 3 the
data are explored. Section 4 describes the methodological
framework. In Section 5 the results and findings of the
research study are presented. Finally,in Section 6 the con-
tributions are summarized, pointers for decision makers
are offered, and future research avenues are explored.
2LITERATURE REVIEW
2.1 Temporal granularity
Tourism demand predictions on a daily basis are not com-
mon in the tourism forecasting literature (Peng et al.,
2014). Predictive models in the past were primarily built
using data from offline sources, such as tourism bureaus
that provide explanatory variables aggregated over a given
time period (Song & Li, 2008). While annual and quarterly
data statistics are quite prevalent, more recent research
also uses monthly data for predictions (Gunter & Önder,
2015).
Tourism demand, however, also fluctuates on a daily
basis, as the number of mid-week bookings differs from the
weekend. These variations in daily tourism demand make
effective policy planning more difficult for tourist busi-
nesses and governments. When tourism demand exceeds
supply,decision makers should adequately react to prevent
considerable strain on tourism infrastructure and services;
or when there is a period of lower tourism demand, miss-
ing the time window to invest can lead to loss in profitabil-
ity (Karamustafa & Ulama, 2010; Pegg et al., 2012). Many
tourism products are perishable, such as hotel rooms, air-
line seats, and car rentals. The moment the deadline passes
and a seat remains unfilled, the product, and in conse-
quence its revenue, is permanently lost (Dharmaratne,
1995). Moreover, the revenue is impacted by differing
demands across weekdays (Malasevska & Haugom, 2018)
and price elasticities that change from week to weekend
(Morlotti et al., 2017).
2.2 Tourism demand forecasting
methods
Tourism forecasting models have been studied in a large
variety, from early time series models to more recentones
such as machine learning models. Time series methods,
such as the autoregressive moving average (ARMA) pro-
cess (Box et al., 2015), predict how the future trend will
behave and require no more than one historical data series
(Hamilton, 1994). Time series models are quite suitable
to deal with linear trends, such as Brown's double expo-
nential smoothing (DES) model (Brown, 2004), which has
been used to predict tourist arrivals in Hawaii (Geurts &
Ibrahim, 1975). These models rely on historical demand
only and do not directly link to explanatory variables and,
in consequence, cannot analyze tourist behavior (Song &
Li, 2008). A second group of methods, which is more useful
for decision makers, are econometric models (Song et al.,
2008). The error correction model (ECM), for example,
is capable of assessing both long-run equilibrium and
short-run disequilibrium relationships, and have already
been used to analyze Canadian and Tunisian tourism
(Ouerfelli, 2008; Veloce, 2004). At an international level,
the dynamic almost-ideal demand system (AIDS) model
has been shown to make reliable forecasts for European
tourism demand by UK citizens (De Mello & Fortuna,
2005). A third and final category in the literature is the
group of artificial intelligence (AI) methods (Wang & Hsu,
2008). As machine learning methods are rapidly devel-
oping and big data are becoming more available, these
models are more frequently used for tourism prediction
(Claveria et al., 2016). Chen and Wang (2007) have com-
bined a genetic algorithm (GA) with support vector regres-
sion (SVR) to predict tourist arrivals in China. Although
AI models are specialized in achieving a good predictive
performance (Claveria et al., 2016), trained AI models are
complex and not always interpretable in terms of uncer-
tainty (Burger et al., 2001).
While feature extraction—the conversion of raw data
into a set of useful features—and feature selection—the
552

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT