In search of meaning: Lessons, resources and next steps for computational analysis of financial discourse

Document

Cited in

Date	01 March 2019
DOI	http://doi.org/10.1111/jbfa.12378
Published date	01 March 2019
Author	Mahmoud El‐Haj,Vasiliki Simaki,Paul Rayson,Martin Walker,Steven Young

DOI: 10.1111/jbfa.12378

In search of meaning: Lessons, resources and next

steps for computational analysis of financial

discourse

Mahmoud El-Haj1Paul Rayson1Martin Walker2Steven Young3

Vasiliki Simaki4

1School of Computing and Communications,

Lancaster University, UK

2Alliance Manchester Business School,

Manchester University, UK

3Lancaster University Management School,

Lancaster University, UK

4Department of Linguistics and English

Language, and Centre for Corpus Approaches in

Social Science (CASS),Lancaster University, UK

Correspondence

StevenYoung, Lancaster University Management

School,Lancaster University, UK.

Email:s.young@lancaster.ac.uk

Fundinginformation

Financialsupport was provided by the Economic

andSocial Research Council (ESRC) (contracts

ES/J012394/1,ES/K002155/1, ES/R003904/1

andES/S001778/1) and the Research Board of

theInstitute of Chartered Accountants in England

andWales.

Abstract

We critically assess mainstream accounting and finance research

applying methods from computational linguistics (CL) to study finan-

cial discourse. We also review common themes and innovations in

the literature and assess the incremental contributions of studies

applying CL methods over manual content analysis. Key conclusions

emerging from our analysis are: (a) accounting and finance research

is behind the curve in terms of CL methods generally and word

sense disambiguation in particular; (b) implementation issues mean

the proposed benefits of CL are often less pronounced than propo-

nents suggest; (c) structural issues limit practical relevance; and (d)

CL methods and high quality manual analysis represent complemen-

tary approaches to analyzing financial discourse. We describe four

CL tools that haveyet to gain traction in mainstream AF research but

whichwe believe offer promising ways to enhance the study of mean-

ingi nfinancial discourse. Thefour tools are named entity recognition

(NER), summarization, semantics and corpus linguistics.

KEYWORDS

10-K, annual reports, computational linguistics, conference calls,

corpus linguistics, earnings announcements, machine learning, NLP,

semantics

1INTRODUCTION

Informationi s the lifeblood of financial marketsand the amount of data available to decision-makers is increasing expo-

nentially. Bank of England (2015) estimates that 90% of global information has been created during the last decade,

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and repro-

duction in anymedium, p rovidedthe original work is properly cited.

2019 The Authors. Journal of Business Finance & Accounting published byJohn Wiley & Sons Ltd

J Bus Fin Acc. 2019;46:265–306. wileyonlinelibrary.com/journal/jbfa 265

266 EL-HAJ ET AL.

the vast majority of which is unstructured data (e.g., free-form text).1The dramaticgrowth in written and spoken data

is clearly evident in financial markets. For example, Dyer, Lang, and Stice-Lawrence (2017) find a 113% increase in

the median length of US registrants’ 10-K annual reports over the period 1996–2013 and Lewis and Young(2019)

report similar results for UK annual reports. For manyapplications, the volume of unstructured financial data exceeds

the capacity of humans to process the content manually.Users of financial market data are therefore turning increas-

ingly to computational linguistics to assist with the task of processing large volumes of unstructured data.2Academic

research in accounting and finance is mirroring this trend.

Our paper has three objectives. First, we critically assess mainstream accounting and finance research that applies

methods from computational linguistics (CL)to study written and spoken language (discourse) in financial markets. Our

critique views extant research through the following three lenses: consistency with the core principles in CL; perfor-

mance against the advantages of automated textual analysis proposed by Li (2010a); and practicalrelevance. Second,

we reviewcommon themes and innovations in the literature and assess the incremental contributions of studies apply-

ing CL methods overmanual content analysis. Third, we describe a suite of CL tools that are yet to gain traction in main-

stream accounting and finance research but which we believeoffer promising ways to enhance the study of meaning in

financial discourse.

A number of studies review aspects of CL research in accounting and finance (AF). Li (2010a) evaluates the benefits

of CL methods over manual content analysis and reviews the first waveof studies using automated methods to exam-

ine accounting disclosures. Loughran and McDonald (2016) extendLi’s (2010a) work by combining an updated review

of the literature with a more focused survey and description of methods that characterizeextant studies in AF. In par-

ticular, they critique studies on readability,highlight the importance of transparency when describing the process of

converting rawtext to quantitative measures, and reiterate Li’s (2010a) call for economic theory to drive choice of CL

methods rather than vice versa. Kearney and Liu (2014) narrow the focus further by reviewing studies on sentiment

analysis published in finance before 2013. Finally, Fisher,Garnsey, and Hughes (2016) synthesize the stream of natu-

ral language processing (NLP) research utilizing AF data and identify paths for future research. Their review suggests

a disconnect between mainstream AF research employing CL methods and the broader computer science literature

using accounting and finance datasets.

Prior surveys start from the premise that the motivation for advocating CL analysis of financial text is compelling.

This view is not universally accepted, however,although the basis for scepticism is not clearly articulated. We motivate

our reviewby critiquing four reasons underpinning such suspicion: (a) doubt over the value of studying narrative disclo-

sures; (b) distrust of CL approaches to scoring text; (c)cynicism about the validity of applying CL methods to financial

market disclosures; and (d) concern over the way methods are applied and the relevance of the research questions

examined. We conclude that the final explanation represents the most credible argument against using CL tools to

analyze financial text. Weproceed to evaluate research in light of this concern using three lenses.

Our first evaluation lens compares the application of methods in AF research to four core principles and practices

that underpin the CL approach (corpus building, annotation, NLP and evaluation). Our approach differs from previ-

ous surveys that define the textual analysis landscape according to the state-of-the-art in AF.We conclude that while

beacons of best practiceare evident, mainstream AF research appears to be behind the curve in terms of CL sophistica-

tion generally,and word sense disambiguation in particular, when judged against computer science and even specialist

subfields within AF.Our second evaluation lens is the advantages of CL analysis (over manual coding) proposed by Li

(2010a). Predicted benefits include lower scoring costs, wider generalizability,greater objectivity, improved replicabil-

ity, enhanced statistical power,and scope for identifying ‘hidden’ linguistic features. We conclude that these benefits

1In 1998, Merrill Lynchprojected that available data will expand to 40 zettabytes (one zettabyte equals one trillion gigabytes) by 2020 and estimated some-

where between80–90% of all potentially useable business information mayoriginate in unstructured form. Reinsel, Gantz, and Rydning (2018) forecast that

theglobal datasphere will expand to 175 zettabytes by 2025. Although the estimate includes video and image data as well as structured data in databases, the

majoritycompromises plain text.

2Throughout this paper we use “computational linguistics” as shorthand for the areas of natural language processing (NLP), text-focused artificial intelli-

gence (AI) and information retrievalfrom computer science, plus the smaller group of empirical methods developed in the field of corpus linguistics involving

frequency-basedapproaches to studying language.

EL-HAJ ET AL.267

are often less pronounced than AF research portrays. Our third evaluation lens assesses the relevanceof extant work

to debates in policy and practice regarding the role and value of financial discourse. We argue that relevance is con-

strainedby at least two factors. First, the majority of CL research in AF operates at an aggregate level such as the entire

10-K or the complete Management Discussion and Analysis (MD&A), whereas practitioners, standard setters and reg-

ulators are often interested in more granular issues such as the format and content of specific disclosures, placement

of content within the overall reporting package, limits on the use of jargon concerning particular topics, etc. Second,

it is not immediately obvious how commonly employed empirical proxiesfor discourse quality such as readability (Fog

index), tone (word-frequency counts) and text re-use (cosine similarity) map into the practicalproperties of effective

communication identified by financial marketregulators.

With these caveats in mind, we proceed to review common themes and innovationsin the literature and assess the

incremental contributions of work applying CL methods over manual content analysis. The median AF study examines

10-K filings using basic content analysis methods such as readability algorithms and keyword counts. The degree of

clustering is consistent with the initial phase of the research lifecycle, with agendas shaped as much by ease of data

access and implementation as by research priorities. Nevertheless, closer inspection reveals how relatively basic

word-level methods have been used to provide richer insights into the properties and effects of financial discourse.

Refinements to standard readability metrics, development of domain-specific wordlists, and the use of weighting

schemes and text filtering to improve word-sense disambiguation represent welcome advances on naïve unigram

word counts. We also acknowledge a move towards the use of more NLP technology in the form of machine learning

and topic modeling, although the trend is characterized by a narrow methodological focus that lags best practice. We

conclude that the main weakness of AF research is its continuing reliance on bag-of-words methods that fail to reflect

context and meaning.

Analysis of financial discourse using manual content methods has a long tradition in AF (Merkl-Davies & Brennan,

2007). Establishing incremental contribution for studies adopting a CL lens is not a straightforward task. A significant

fraction of CL-focused studies appears to re-examinebroadly similar issues to those previously explored using manual

methods and arrives at broadly similar conclusions. We argue that emerging CL research should take greater care to

ensurethat the evidence from extant manual content analysis studies is afforded appropriate recognition. We also note

thatsome important discourse-related research questions and settings in AF do not lend themselves naturally to the CL

treatment. A keyconclusion to emerge from our review is that CL methods and high-quality manual analysis represent

symbiotic approaches to analyzing financial discourse. Both approaches are associated with comparative advantages

and comparativeweaknesses. The challenge for researchers choosing the CL route is to ensure that research questions

align closely with the fundamental comparative advantages of scalabilityand latent feature detection.

In addition to taking a more critical and dispassionate approach to evaluating the contribution of automated tex-

tual analysis research in AF,we extend Li (2010a) and Loughran and McDonald (2016) by adopting a forward-looking

perspective on CL methods and their applicability. Specifically,we review the following four tools from the CL field

that offer significant potential for AF researchers: named entity recognition (NER), summarization, semantic analy-

sis and corpus methods. As well as helping to extend research horizons in financial discourse, the discussion speaks

to the question posed by Loughran and McDonald (2016, p. 1223) regarding the potential benefits of parsing more

deeply for contextual meaning in a business context.Their concern is that using more complex methods beyond simple

word counts that ignore the sequence in which words are presented (i.e., meaning) may add more noise than signal to

the empirical construct. We discuss tools from CL specifically designed to improve the signal-to-noise ratio bydisam-

biguating word meaning.

2AUTOMATED ANALYSIS OF NARRATIVE DISCLOSURES IN

ACCOUNTING AND FINANCE RESEARCH

While qualitative disclosures havelong attracted the interest of AF researchers, the need to hand collect and manually

score content constrained work in this area. Early work by Abrahamsonand Amir (1996), Antweiler and Frank (2004),

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

In search of meaning: Lessons, resources and next steps for computational analysis of financial discourse

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users