Textual Analysis in Accounting and Finance: A Survey

Published date01 September 2016
DOIhttp://doi.org/10.1111/1475-679X.12123
AuthorTIM LOUGHRAN,BILL MCDONALD
Date01 September 2016
DOI: 10.1111/1475-679X.12123
Journal of Accounting Research
Vol. 54 No. 4 September 2016
Printed in U.S.A.
Textual Analysis in Accounting and
Finance: A Survey
TIM LOUGHRAN
AND BILL MCDONALD
Received 20 January 2015; accepted 15 March 2016
ABSTRACT
Relative to quantitative methods traditionally used in accounting and finance,
textual analysis is substantially less precise. Thus, understanding the art is of
equal importance to understanding the science. In this survey, we describe
the nuances of the method and, as users of textual analysis, some of the trip-
wires in implementation. We also review the contemporary textual analysis
literature and highlight areas of future research.
JEL codes: D82; D83; G14; G18; G30; M40; M41
Keywords: textual analysis; sentiment analysis; bag of words; readability;
word lists; Zipf’s law; cosine similarity; Na¨
ıve Bayes
1. Introduction
Textual analysis, in some form, resides across many disciplines under vari-
ous aliases, including computational linguistics, natural (or statistical) lan-
guage processing, information retrieval, content analysis, or stylometrics.
The notion of parsing text for patterns has a long history. In the 1300s, fri-
ars of the Dominican Order produced concordances of the Latin Vulgate
(Biblical translations) to provide indexes of common phrases (Catholic
Encyclopedia [1908]). In 1901, T.C. Mendenhall used textual analysis to
Mendoza College of Business, University of Notre Dame.
Accepted by Christian Leuz. We thank Brad Badertscher, Peter Easton, Diego Garcia, two
anonymous referees, and seminar participants at Columbia Business School’s News and Fi-
nance Conference for helpful comments.
1187
Copyright C, University of Chicago on behalf of the Accounting Research Center,2016
1188 T.LOUGHRAN AND B.MCDONALD
examine whether some works attributed to Shakespeare might have been
written by Bacon (see Williams [1975]). During the world wars, the method
was increasingly adapted to political speech, where carefully scripted
rhetorical choices were interpreted as signals of diplomatic trends (e.g.,
Burke [1939]). In the sixties, the systematic analysis of text increased in
popularity with Mosteller and Wallace’s [1964] purported resolution of au-
thorship for the Federalist Papers. In the past few decades, the release of a
large annotated corpus from the Wall Street Journal (WSJ) led to significant
increases in the accuracy of statistical parsing (see Marcus, Santorini, and
Marcinkiewicz [1993]).
More recently, with the exponential increase in computing power over
the past half century and the increased focus on textual methods driven by
the requirements of Internet search engines, the application of this tech-
nique has permeated most disciplines in one way or another. In accounting
and finance, the online availability of news articles, earnings conference
calls, Securities and Exchange Commission (SEC) filings, and text from so-
cial media provide ample fodder for applying the technology.
Can we tease out sentiment from mandated company disclosures and
contextualize quantitative data in ways that might predict future valuation
components? Can we computationally read news articles and trade before
humans can read and assimilate the information? If Twitter’s tweets pro-
vide the pulse of information, can we monitor these messages in real time
to gain an informational edge? Do textual artifacts provide an additional at-
tribute that predicts bankruptcies? Are there subtle cues in managements’
earnings conference calls that computers can discern better than analysts?
More broadly, can we examine textual artifacts to measure the quantity and
quality of information in a collection of text, including both the intended
message and, importantly, any unintended revelations? These are all inter-
esting questions potentially answered by the technology of textual analysis.
Textual analysis is an emerging area in accounting and finance and, as
a result, the corresponding taxonomies are still somewhat imprecise. Tex-
tual analysis can be considered as a subset of what is sometimes labeled
qualitative analysis, with textual analysis most frequently falling into the
categories of either targeted phrases, sentiment analysis, topic modeling,
or measures of document similarity. Readability is another aspect of textual
analysis, which is differentiated from some of the prior methods in that it
attempts to measure the ability of the reader to decipher the intended mes-
sage, whereas the other methods typically focus on computationally extract-
ing meaning from a collection of text. Other examples of the more general
topic of qualitative analysis would include Coval and Shumway [2001], who
consider the information conveyed by noise levels in the Treasury Bond
Futures trading pit at the Chicago Board of Trade, or Mayew and Venkat-
achalam [2012], who examine the audio from earnings conference calls to
determine managerial affective states.
Following the pioneering papers by Frazier, Ingram, and Tennyson
[1984], Antweiler and Frank [2004], Das and Chen [2007], Tetlock [2007],
TEXTUAL ANALYSIS IN ACCOUNTING AND FINANCE 1189
and Li [2008], accounting and finance researchers have actively examined
the impact of qualitative information on equity valuations. The words se-
lected by managers to describe their operations and the language used by
media to report on firms and markets have been shown to be correlated
with future stock returns, earnings, and even future fraudulent activities
of management. Clearly, stock market investors incorporate more than just
quantitative data in their valuations, but as the accounting and finance dis-
ciplines embrace this new technology, we must proceed carefully to assure
that what we purport to measure is in fact so.
The burgeoning literature in textual analysis is already summarized well
in other papers, although the increasing popularity of the method quickly
dates any attempt to distill research on the topic. Li [2010a], in a survey of
the literature, provides details on earlier manual-based examples of textual
analysis, discusses the modern literature by topical area (e.g., information
content, earnings quality, market efficiency), and itemizes a prescient list
of potential research topics. His conclusions echo a theme of this paper;
that is, the literature needs to be less centered on finding ways to apply
off-the-shelf textual methods borrowed from highly evolved technologies
in computational linguistics and instead be more motivated by hypotheses
“closely tied to economic theories” (Li [2010a, p. 158]).
Kearney and Liu [2014] provide a more recent survey of methods and
literature with a focus on textual sentiment. Their table 3 provides a useful
annotated bibliography of most sentiment-related papers published prior
to 2013. Das’s [2014] monograph, in addition to reviewing the academic
literature, provides an excellent user’s guide for someone just approaching
the subject, including code snippets for some of the basic tools used in
textual analysis.
In what follows, we will fold a more selective and focused survey of the
accounting, finance, and economics literature on textual analysis into a
description of some of its methods. We add value beyond simply offering
an updated literature review by also underscoring the methodological trip-
wires for those approaching this relatively new technique. Qualitative data
require the additional step of translating text into quantitative measures,
which are then used as inputs into either traditional or text-based methods.
We emphasize the importance of exposition and transparency in this trans-
formation process because this is where much of the imprecision of tex-
tual analysis is introduced. More generally, we emphasize the importance
of replicability in the less-structured methods used in textual analysis. Re-
garding the topic of readability, we underscore the importance of carefully
specifying what is meant by the concept in the context of business docu-
ments, where the traditional hallmarks of readability (polysyllabic words
and long sentences) are rarely distinguishing characteristics in the inter-
pretation of financial text.
The remainder of our survey is organized as follows. In section 2, before
examining those methods intended to extract meaning from text collec-
tions, we consider the broader topic of information content and document

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT