Tehran stock exchange prediction using sentiment analysis of online textual opinions

Document

Cited in

Published date	01 January 2020
Date	01 January 2020
Author	Mehrnoush Shamsfard,Arezoo Hatefi Ghahfarrokhi
DOI	http://doi.org/10.1002/isaf.1465

RESEARCH ARTICLE

Tehran stock exchange prediction using sentiment analysis of

online textual opinions

Arezoo Hatefi Ghahfarrokhi | Mehrnoush Shamsfard

Faculty of Computer Engineering and Science,

Shahid Beheshti University, Tehran, Iran

Correspondence

Arezoo Hatefi Ghahfarrokhi, Faculty of

Computer Engineering and Science, Shahid

Beheshti University. Tehran, Iran.

Email: arezuhatefi88@gmail.com

Summary

We investigate the impact of social media data in predicting the Tehran Stock

Exchange variables for the first time. We consider the closing price and daily return

of three different stocks for this investigation. We collected our social media data

from Sahamyab.com/stocktwits for about 3 months. To extract information from

online comments, we propose a hybrid sentiment analysis approach that combines

lexicon-based and learning-based methods. Since lexicons that are available for the

Persian language are not practical for sentiment analysis in the stock market domain,

we built a particular sentiment lexicon for this domain. After designing and calculat-

ing daily sentiment indices using the sentiment of the comments, we examine their

impact on the baseline models that only use historical market data and propose new

predictor models using multi-regression analysis. In addition to the sentiments, we

also examine the comments volume and the users' reliabilities. We conclude that the

predictability of various stocks in the Tehran Stock Exchange is different depending

on their attributes. Moreover, we indicate that only comments volume could be use-

ful for predicting the closing price, and both the volume and the sentiment of the

comments could be useful for predicting the daily return. We demonstrate that users'

trust coefficients have different behaviours toward the three stocks.

KEYWORDS

natural language processing, sentiment analysis, social media, stock market prediction

1|INTRODUCTION

Stock market prediction has always been one of the demands of

researchers and investors. If they can predict the future behaviour of

stock prices,they can quickly act basedon this prediction and gainmore

profit. Thisdesire has led them to many approaches for market analysis.

Many theories have been suggested to explain stock market move-

ments. Some ofthem focus on the underlying business behind a stock's

price (fundamental analysis; Greig, 1992; Mahmoud & Sakr, 2012),

some focus on historical price movements (technical analysis; Cervelló-

Royo, Guijarro, & Michniuk,2015; Xiao & Enke, 2017), andsome others

focus on the human behavioural aspects of the market (behavioural

finance; Keynes, 1936; Shleifer, 2000; Gao, 2008;Bollen, Mao, & Zeng,

2011). Oneof the areas of behavioural finance revolves aroundthe idea

of the sentimentof the market participants.It means that, in additionto

historical prices, the current stock market is affected by the society's

and investors' mood. Since rapid growth of the Internet has led inves-

tors to share their opinions about the market in social media, forums,

blogs, and so on, stock market prediction base on online sentiment

tracking hasdrawn a lot of attention recently(Antweiler & Frank, 2004;

Bollen et al., 2011; Nguyen, Shirai,& Velcin, 2015; O'Hare et al., 2009;

Oliveira, Cortez, & Areal, 2017; Wu, Zheng, & Olson, 2014). In this

regard, microblogs are one of the most promising online resources to

access investors. Mao, Counts, and Bollen (2011) found that Twitter

has a strong predictive power, evenmore than the predictive power of

survey sentiment and news media analysis.

To the best of our knowledge, although there are several studies

related to the prediction of the Tehran Stock Exchange (TSE)

Received: 27 September 2018 Revised: 27 January 2020 Accepted: 3 February 2020

DOI: 10.1002/isaf.1465

movement trends (Ahangar, Yahyazadehfar, & Pournaghshband,

2010; Ebrahimpour, Nikoo, Masoudnia, Yousefi, & Ghaemi, 2011; Fas-

anghari & Montazer, 2010; Zahedi & Rounaghi, 2015), none of them

has considered sentiment. These studies are classified as technical

analyses, and they have only used historical prices and volume. In

addition, these studies have used data mining techniques such as neu-

ral networks and genetic algorithms. It seems that this study is the

first aimed at investigating the effect of incorporating sentiment into

the TSE prediction models. We investigated three symbols, Vebmellat,

Shabandar, and Khodro, that belong to three different industries. We

gathered users' comments about these stocks from Sahamyab.

com/stocktwits for about 3 months. After extracting the sentiment of

these comments by our proposed sentiment analysis method and

making sentiment indices, we examined the impact of these indices

on the baseline models. Since the reliability of the users affects the

importance of their sentiments, we calculated a trust coefficient for

each user based on his/her historical comments and incorporated

them in several indices.

The sentiment classification techniques can be divided into the

machine-learning methods, lexicon-based methods, and hybrid

methods. Machine-learning methods apply the famous machine-

learning algorithms, such as support vector machine (SVM) and naive

Bayes, and use syntactic and linguistic features. These methods

require labelled training data that is often difficult to obtain. The

lexicon-based methods rely on generic or domain-dependent lexicons

or keywords. The hybrid approaches combine both methods, and the

lexicon plays mostly a key role.

Several studies have been conducted on sentiment analysis in the

Persian language. (Alimardani & Aghaei, 2015; Basiri, Naghsh-Nilchi, &

Ghassem-Aghaee, 2014; Saraee & Bagheri, 2013; Shams, Shakery, &

Faili, 2012). Some of them have led to generating a lexicon that is

either for the general domain or domains other than the stock market.

In a comparison, we will show that generic lexicons are not appropri-

ate for sentiment analysis in the stock market domain. In addition,

Oliveira, Cortez, and Areal (2016) obtained a similar result in their

study. In this paper, we propose a hybrid method for sentiment analy-

sis in the stock market domain. First, we make a sentiment lexicon

using the comments of this domain and then we use the lexical items

of the lexicon as the classification features of the machine-learning

classification algorithms.

The rest of the paper is organized as follows. Section 2 provides

the relevant literature, concentrating on some previous approaches of

sentiment analysis for stock market prediction and sentiment analysis

in the Persian language as well. Section 3 describes our data set and

our proposed method. Section 4 evaluates the results of the experi-

ments. Section 5 discusses our results and concludes our contribu-

tions. Finally, future work is presented in Section 6.

2|RELATED WORK

Using sentiment in financial markets was popularized in the early

twentieth century with the introduction of the Keynes beauty contest

analogy, which argued that investors select the most beautiful (i.e. the

most favourite) stock to invest in because they care about the

thoughts of other investors about that stock more than its real value

(Keynes, 1936). Various investors use the concept of sentiment differ-

ently. As an example, when a contrarian investor recognizes that sen-

timent about the market is very negative, they may buy more stocks

than usual because they believe great movements are coming to the

market (Brown & Cliff, 2004).

At first, various surveys, such as the National Association of

Active Investment Managers and the American Association of Individ-

ual Investors (AAII) regular reports, were used to evaluate investors'

and market sentiment. These surveys were used by many investors to

understand the overall sentiment of the market, economy, and indus-

tries in order to make the necessary adjustments to their portfolios to

take advantage of, or to protect themselves from, changes in market

sentiment (Mian & Sankaraguruswamy, 2012). Despite these surveys'

popularity, they need a lot of resources and are expensive. In addition,

they may face the problem of unreliable respondents, individual

biases, social bias, and group thinking (Da, Engelberand, & Gao, 2010;

Singer, 2002).

In recent years, researchers have used a variety of methods to

compute sentiment indicators using bulk online data. This approach is

more appropriate than using surveys. First, computational analysis of

the sentiment and the public mood is faster, more precise, and less

costly than conducting large-scale surveys. Second, there is strong

support for this claim that the sentiment obtained from this approach

is a valid indicator of public opinion, as far as it is used to predict many

socio-economic phenomena, such as presidential elections (Burnap,

Gibson, Sloan, Southern, & Williams, 2016; Tumasjan, Sprenger, San-

dner, & Welp, 2010; White, 2016) and commercial sales (Choi & Var-

ian, 2012; Liu, Ding, Chen, Chen, & Guo, 2016; Mishne & Glance,

2006).

As far as we know, three distinct groups of online data sources

have been used for financial forecasting. First, it has been shown that

the content of the news media is an effective factor in investor's sen-

timent and desire. Tetlock (2007), for example, found that a high level

of pessimism in the Wall Street Journal led to a decline in market

returns on the following day.

Second, it has been indicated that web search query data are

related to and even a predictor of fluctuations in the stock market.

The search volume of the stock names reveals the interests of inves-

tors, and therefore the high volume of searches for the name of a

share reflects its price increase in the short term and the inversion of

its price over the long term (Da et al., 2010). Also, the search volume

has a strong correlation with the volume of traded shares, so that the

peak of the search volume predicts the peak of the trading volume in

1 day or more (Bordino et al., 2012).

Finally, social media content has become an important data

source for measuring the sentiment of society and investors. In an ini-

tial research, online stock message boards were used to predict mar-

ket volatility and trading volume (Antweiler & Frank, 2004). In recent

years, sentiment indicators extracted from social networks, such as

Facebook (Karabulut, 2013), LiveJournal (Gilbert & Karahalios, 2010),

HATEFI AND SHAMSFARD 23

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Tehran stock exchange prediction using sentiment analysis of online textual opinions

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users