Using long short‐term memory neural networks to analyze SEC 13D filings: A recipe for human and machine interaction

Date01 October 2019
AuthorDavid Louton,Murat Aydogdu,Hakan Saraoglu
Published date01 October 2019
DOIhttp://doi.org/10.1002/isaf.1464
RESEARCH ARTICLE
Using long short-term memory neural networks to analyze SEC
13D filings: A recipe for human and machine interaction
Murat Aydogdu
1
| Hakan Saraoglu
2
| David Louton
2
1
Rhode Island College, Providence, RI, USA
2
Bryant University, Smithfield, RI, USA
Correspondence
David Louton, Bryant University, 1150
Douglas Pike, Smithfield, RI 02917-1284, USA.
Email: dlouton@bryant.edu
Summary
We implement an efficient methodology for extracting themes from Securities
Exchange Commission 13D filings using aspects of human-assisted active learning
and long short-term memory (LSTM) neural networks. Sentences from the Purpose
of Transactionsection of each filing are extracted and a randomly chosen subset is
labelled based on six filing themes that the existing literature on shareholder activism
has shown to have an impact on stock returns. We find that an LSTM neural network
that accepts sentences as input performs significantly better, with precision of 77%,
than an alternately specified neural network that uses the common bag of words
approach. This indicates that both sentence structure and vocabulary are important
in classifying SEC 13D filings. Our study has important implications, as it addresses
the recent cautions raised in the literature that analysis of finance and accounting-
related text sources should move beyond bag-of-words approaches to alternatives
that incorporate the analysis of word sense and meaning reflecting context.
KEYWORDS
active learning, computational linguistics, neural networks, shareholder activism
1|INTRODUCTION
The field of finance has long been rich in both numerical and textual
data. Historically, research focused primarily on the available numeri-
cal data and text was viewed as too intractable and labour intensive
to provide attractive opportunities for researchers. With the advent
of high-quality open-source machine learning and natural language
processing libraries, combined with the increase in power (and
decrease in costs) of processors and the commoditization of data
storage, it has become feasible for researchers to begin to investigate
the enormous amount of textual data available in finance. However,
two obstacles have consistently presented difficulties: (1) the cost and
labour intensiveness of labelling sufficiently large data sets for train-
ing, testing, and cross-validating machine-learning classification
models; and (2) the relatively low signal-to-noise ratio in lengthy offi-
cial documents, such as Securities Exchange Commission (SEC) filings.
Together, these issues have made it difficult for researchers to effec-
tively scale their studies to analyse the large data sets that are avail-
able. For example, we find 326,745 13D and 13D/A filings between
1994 and 2018, but the largest data set used in the existing literature
on these filings consists of approximately 10,000 observationssee
Lim (2017). The main contribution of our study is to propose a set of
methodological alternatives aimed at breaking this data access barrier.
We address the first challenge by introducing a human-assisted
active learning approach to efficiently label a data set consisting of
sentences drawn from SEC 13D filings. These filings are a primary
source of information on changes in the balance of ownership in pub-
licly traded companies, but they tend to be verbose and inconsistently
formatted, so researchers have generally relied on relatively small,
manually labelled data sets.
In addressing the second challenge, we start from the premise
that text with a low signal-to-noise ratio, such as that found in SEC fil-
ings and other complex documents, can be more effectively analysed
using methods that move beyond the bag-of-words approach com-
monly seen in the literature. Our approach looks at each document as
a collection of sentences rather than as a collection of words. We use
a set of labelled sentences from SEC form 13D filings to train an
ensemble of neural networks and then evaluate their performance in
Received: 13 October 2019 Revised: 5 December 2019 Accepted: 10 December 2019
DOI: 10.1002/isaf.1464
Intell Sys Acc Fin Mgmt. 2019;26:153163. wileyonlinelibrary.com/journal/isaf © 2020 John Wiley & Sons, Ltd. 153

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT