Tehran stock exchange prediction using sentiment analysis of online textual opinions

Published date01 January 2020
Date01 January 2020
AuthorMehrnoush Shamsfard,Arezoo Hatefi Ghahfarrokhi
DOIhttp://doi.org/10.1002/isaf.1465
RESEARCH ARTICLE
Tehran stock exchange prediction using sentiment analysis of
online textual opinions
Arezoo Hatefi Ghahfarrokhi | Mehrnoush Shamsfard
Faculty of Computer Engineering and Science,
Shahid Beheshti University, Tehran, Iran
Correspondence
Arezoo Hatefi Ghahfarrokhi, Faculty of
Computer Engineering and Science, Shahid
Beheshti University. Tehran, Iran.
Email: arezuhatefi88@gmail.com
Summary
We investigate the impact of social media data in predicting the Tehran Stock
Exchange variables for the first time. We consider the closing price and daily return
of three different stocks for this investigation. We collected our social media data
from Sahamyab.com/stocktwits for about 3 months. To extract information from
online comments, we propose a hybrid sentiment analysis approach that combines
lexicon-based and learning-based methods. Since lexicons that are available for the
Persian language are not practical for sentiment analysis in the stock market domain,
we built a particular sentiment lexicon for this domain. After designing and calculat-
ing daily sentiment indices using the sentiment of the comments, we examine their
impact on the baseline models that only use historical market data and propose new
predictor models using multi-regression analysis. In addition to the sentiments, we
also examine the comments volume and the users' reliabilities. We conclude that the
predictability of various stocks in the Tehran Stock Exchange is different depending
on their attributes. Moreover, we indicate that only comments volume could be use-
ful for predicting the closing price, and both the volume and the sentiment of the
comments could be useful for predicting the daily return. We demonstrate that users'
trust coefficients have different behaviours toward the three stocks.
KEYWORDS
natural language processing, sentiment analysis, social media, stock market prediction
1|INTRODUCTION
Stock market prediction has always been one of the demands of
researchers and investors. If they can predict the future behaviour of
stock prices,they can quickly act basedon this prediction and gainmore
profit. Thisdesire has led them to many approaches for market analysis.
Many theories have been suggested to explain stock market move-
ments. Some ofthem focus on the underlying business behind a stock's
price (fundamental analysis; Greig, 1992; Mahmoud & Sakr, 2012),
some focus on historical price movements (technical analysis; Cervelló-
Royo, Guijarro, & Michniuk,2015; Xiao & Enke, 2017), andsome others
focus on the human behavioural aspects of the market (behavioural
finance; Keynes, 1936; Shleifer, 2000; Gao, 2008;Bollen, Mao, & Zeng,
2011). Oneof the areas of behavioural finance revolves aroundthe idea
of the sentimentof the market participants.It means that, in additionto
historical prices, the current stock market is affected by the society's
and investors' mood. Since rapid growth of the Internet has led inves-
tors to share their opinions about the market in social media, forums,
blogs, and so on, stock market prediction base on online sentiment
tracking hasdrawn a lot of attention recently(Antweiler & Frank, 2004;
Bollen et al., 2011; Nguyen, Shirai,& Velcin, 2015; O'Hare et al., 2009;
Oliveira, Cortez, & Areal, 2017; Wu, Zheng, & Olson, 2014). In this
regard, microblogs are one of the most promising online resources to
access investors. Mao, Counts, and Bollen (2011) found that Twitter
has a strong predictive power, evenmore than the predictive power of
survey sentiment and news media analysis.
To the best of our knowledge, although there are several studies
related to the prediction of the Tehran Stock Exchange (TSE)
Received: 27 September 2018 Revised: 27 January 2020 Accepted: 3 February 2020
DOI: 10.1002/isaf.1465
22 © 2020 John Wiley & Sons, Ltd. Intell. Sys. Acc. Fin. Mgmt.. 2020;27:2237.wileyonlinelibrary.com/journal/isaf
movement trends (Ahangar, Yahyazadehfar, & Pournaghshband,
2010; Ebrahimpour, Nikoo, Masoudnia, Yousefi, & Ghaemi, 2011; Fas-
anghari & Montazer, 2010; Zahedi & Rounaghi, 2015), none of them
has considered sentiment. These studies are classified as technical
analyses, and they have only used historical prices and volume. In
addition, these studies have used data mining techniques such as neu-
ral networks and genetic algorithms. It seems that this study is the
first aimed at investigating the effect of incorporating sentiment into
the TSE prediction models. We investigated three symbols, Vebmellat,
Shabandar, and Khodro, that belong to three different industries. We
gathered users' comments about these stocks from Sahamyab.
com/stocktwits for about 3 months. After extracting the sentiment of
these comments by our proposed sentiment analysis method and
making sentiment indices, we examined the impact of these indices
on the baseline models. Since the reliability of the users affects the
importance of their sentiments, we calculated a trust coefficient for
each user based on his/her historical comments and incorporated
them in several indices.
The sentiment classification techniques can be divided into the
machine-learning methods, lexicon-based methods, and hybrid
methods. Machine-learning methods apply the famous machine-
learning algorithms, such as support vector machine (SVM) and naive
Bayes, and use syntactic and linguistic features. These methods
require labelled training data that is often difficult to obtain. The
lexicon-based methods rely on generic or domain-dependent lexicons
or keywords. The hybrid approaches combine both methods, and the
lexicon plays mostly a key role.
Several studies have been conducted on sentiment analysis in the
Persian language. (Alimardani & Aghaei, 2015; Basiri, Naghsh-Nilchi, &
Ghassem-Aghaee, 2014; Saraee & Bagheri, 2013; Shams, Shakery, &
Faili, 2012). Some of them have led to generating a lexicon that is
either for the general domain or domains other than the stock market.
In a comparison, we will show that generic lexicons are not appropri-
ate for sentiment analysis in the stock market domain. In addition,
Oliveira, Cortez, and Areal (2016) obtained a similar result in their
study. In this paper, we propose a hybrid method for sentiment analy-
sis in the stock market domain. First, we make a sentiment lexicon
using the comments of this domain and then we use the lexical items
of the lexicon as the classification features of the machine-learning
classification algorithms.
The rest of the paper is organized as follows. Section 2 provides
the relevant literature, concentrating on some previous approaches of
sentiment analysis for stock market prediction and sentiment analysis
in the Persian language as well. Section 3 describes our data set and
our proposed method. Section 4 evaluates the results of the experi-
ments. Section 5 discusses our results and concludes our contribu-
tions. Finally, future work is presented in Section 6.
2|RELATED WORK
Using sentiment in financial markets was popularized in the early
twentieth century with the introduction of the Keynes beauty contest
analogy, which argued that investors select the most beautiful (i.e. the
most favourite) stock to invest in because they care about the
thoughts of other investors about that stock more than its real value
(Keynes, 1936). Various investors use the concept of sentiment differ-
ently. As an example, when a contrarian investor recognizes that sen-
timent about the market is very negative, they may buy more stocks
than usual because they believe great movements are coming to the
market (Brown & Cliff, 2004).
At first, various surveys, such as the National Association of
Active Investment Managers and the American Association of Individ-
ual Investors (AAII) regular reports, were used to evaluate investors'
and market sentiment. These surveys were used by many investors to
understand the overall sentiment of the market, economy, and indus-
tries in order to make the necessary adjustments to their portfolios to
take advantage of, or to protect themselves from, changes in market
sentiment (Mian & Sankaraguruswamy, 2012). Despite these surveys'
popularity, they need a lot of resources and are expensive. In addition,
they may face the problem of unreliable respondents, individual
biases, social bias, and group thinking (Da, Engelberand, & Gao, 2010;
Singer, 2002).
In recent years, researchers have used a variety of methods to
compute sentiment indicators using bulk online data. This approach is
more appropriate than using surveys. First, computational analysis of
the sentiment and the public mood is faster, more precise, and less
costly than conducting large-scale surveys. Second, there is strong
support for this claim that the sentiment obtained from this approach
is a valid indicator of public opinion, as far as it is used to predict many
socio-economic phenomena, such as presidential elections (Burnap,
Gibson, Sloan, Southern, & Williams, 2016; Tumasjan, Sprenger, San-
dner, & Welp, 2010; White, 2016) and commercial sales (Choi & Var-
ian, 2012; Liu, Ding, Chen, Chen, & Guo, 2016; Mishne & Glance,
2006).
As far as we know, three distinct groups of online data sources
have been used for financial forecasting. First, it has been shown that
the content of the news media is an effective factor in investor's sen-
timent and desire. Tetlock (2007), for example, found that a high level
of pessimism in the Wall Street Journal led to a decline in market
returns on the following day.
Second, it has been indicated that web search query data are
related to and even a predictor of fluctuations in the stock market.
The search volume of the stock names reveals the interests of inves-
tors, and therefore the high volume of searches for the name of a
share reflects its price increase in the short term and the inversion of
its price over the long term (Da et al., 2010). Also, the search volume
has a strong correlation with the volume of traded shares, so that the
peak of the search volume predicts the peak of the trading volume in
1 day or more (Bordino et al., 2012).
Finally, social media content has become an important data
source for measuring the sentiment of society and investors. In an ini-
tial research, online stock message boards were used to predict mar-
ket volatility and trading volume (Antweiler & Frank, 2004). In recent
years, sentiment indicators extracted from social networks, such as
Facebook (Karabulut, 2013), LiveJournal (Gilbert & Karahalios, 2010),
HATEFI AND SHAMSFARD 23

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT