Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach

Document

Cited in

Author	BIN LI,YANG BAO,Y. JULIA YU,JIE ZHANG,BIN KE
Published date	01 March 2020
DOI	http://doi.org/10.1111/1475-679X.12292
Date	01 March 2020

DOI: 10.1111/1475-679X.12292

Journal of Accounting Research

Vol. 58 No. 1 March 2020

Printed in U.S.A.

Detecting Accounting Fraud in

Publicly Traded U.S. Firms Using

a Machine Learning Approach

YANG BAO,

∗BIN KE,

†BIN LI,

‡Y. JUL IA YU ,§

AND JIE ZHANG



Received 7 October 2015; accepted 1 October 2019

ABSTRACT

We develop a state-of-the-art fraud prediction model using a machine learn-

ing approach. We demonstrate the value of combining domain knowledge

and machine learning methods in model building. We select our model in-

put based on existing accounting theories, but we differ from prior account-

ing research by using raw accounting numbers rather than ﬁnancial ratios.

∗Antai College of Economics and Management, Shanghai Jiao Tong University;

†Department of Accounting, NUS Business School, National University of Singapore;

‡Department of Finance, Economics and Management School, Wuhan University; §McIntire

School of Commerce, University of Virginia; School of Computer Engineering, Nanyang

Technological University.

Accepted by Christian Leuz. We wish to thank an anonymous reviewer, Mark Cec-

chini, Luo Zuo, and workshop participants at the Singapore Tri-Uni Accounting Research

Conference, the Inaugural Conference on Intelligent Information Retrieval in Accounting

and Finance at CUHK (Shenzhen), and HKUST for helpful comments. Part of this re-

search is funded by a Singapore Ministry of Education Tier 2 grant (No. MOE2012-T2-

1-045). Yang Bao acknowledges the ﬁnancial support from a NSFC grant (No. 71601116)

and Shanghai Pujiang Program (No. 16PJC045). Ke Bin acknowledges the ﬁnancial sup-

port from an MOE start-up grant (No. R-521-000-032-133). Bin Li acknowledges the ﬁnan-

cial support from National Natural Science Foundation of China (71971164, 91646206).

An online appendix to this paper can be downloaded at http://research.chicagobooth.

edu/arc/journal-of-accounting-research/online-supplements. The codes and data used for

our best model RUSBoost are available at the Github repository: https://github.com/

JarFraud/FraudDetection.

199

CUniversity of Chicago on behalf of the Accounting Research Center, 2019

200 Y.BAO ET AL.

We employ one of the most powerful machine learning methods, ensem-

ble learning, rather than the commonly used method of logistic regression.

To assess the performance of fraud prediction models, we introduce a new

performance evaluation metric commonly used in ranking problems that is

more appropriate for the fraud prediction task. Starting with an identical set

of theory-motivated raw accounting numbers, we show that our new fraud

prediction model outperforms two benchmark models by a large margin: the

Dechow et al. logistic regression model based on ﬁnancial ratios, and the Cec-

chini et al. support-vector-machine model with a ﬁnancial kernel that maps

raw accounting numbers into a broader set of ratios.

JEL codes: C53; M41

Keywords: fraud prediction; machine learning; ensemble learning

1. Introduction

Accounting fraud is a worldwide problem. If not detected and prevented

on a timely basis, it can cause signiﬁcant harm to the stakeholders of

fraudulent ﬁrms (e.g., Enron and WorldCom) as well as the stakeholders

of many nonfraudulent ﬁrms indirectly (Gleason, Jenkins, and Johnson

[2008], Goldman, Peyer, and Stefanescu [2012], Hung, Wong, and Zhang

[2015]). Unfortunately, accounting fraud is difﬁcult to detect. Moreover,

even if it is detected, serious damage has usually already been done (Dyck,

Morse, and Zingales [2010]). Hence, efﬁcient and effective methods of cor-

porate accounting fraud detection would offer signiﬁcant value to regula-

tors, auditors, and investors.

The objective of this study is to develop a new accounting fraud pre-

diction model out of sample by using readily available ﬁnancial statement

data from publicly traded U.S. ﬁrms. Following Cecchini et al. [2010], and

Dechow et al. [2011], we use the detected material accounting misstate-

ments disclosed in the SEC’s Accounting and Auditing Enforcement Re-

leases (AAERs) as our accounting fraud sample. Although there are use-

ful nonﬁnancial predictors of accounting fraud (e.g., an executive’s per-

sonal behavior), we use only readily available ﬁnancial data for two reasons.

First, fraud prediction models based on publicly available ﬁnancial data

can be applied to any publicly traded ﬁrm at low cost. Second, most of the

fraud prediction models in the existing accounting literature also rely on

publicly available ﬁnancial data (e.g., Green and Choi [1997], Summers

and Sweeney [1998], Beneish [1999], Cecchini et al. [2010], Dechow et al.

[2011]). By limiting the predictors to ﬁnancial data only, the performance

of our fraud prediction models can be compared with the performance of

such existing models.

There is a fairly large accounting literature on the determinants of

accounting fraud (e.g., Entwistle and Lindsay [1994], Beasley [1996],

Dechow, Sloan, and Sweeney [1996], Beneish [1997, 1999], Summers and

Sweeney [1998], Efendi, Srivastava, and Swanson [2007], Brazel, Jones, and

DETECTING ACCOUNTING FRAUD IN PUBLICLY TRADED U.S.FIRMS 201

Zimbelman [2009], Dechow et al. [2011], Schrand and Zechman [2012]),

but the primary objective of most studies is to explain fraud within sample

and often emphasize causal inference. Our objective is different: We wish

to develop a model that can accurately predict accounting fraud out of sam-

ple (i.e., a prediction problem). Shmueli [2010] shows that the problems

of causal inference and prediction, although related, are fundamentally

different. Speciﬁcally, the objective of causal inference modeling is to min-

imize the bias resulting from model misspeciﬁcation to obtain the most ac-

curate representation of the underlying theory. In contrast, the objective of

predictive modeling seeks to minimize out-of-sample prediction error, that

is, the combination of the bias and estimation variance that results from

using a sample to estimate model parameters.1Although causal inference

represents the main stream of existing social science research, Kleinberg

et al. [2015] show that there are many interesting prediction problems that

are neglected in the extant business and economics literatures.

We use two types of fraud prediction models from the extant litera-

ture as benchmarks. The ﬁrst is ratio-based logistic regression, commonly

used in the accounting literature (e.g., Beneish [1997, 1999], Summers

and Sweeney [1998], Dechow et al. [2011]). Such models typically use

ﬁnancial ratios as predictors; the ratios are often identiﬁed by human

experts based on theories (e.g., the motivation-ability-opportunity frame-

work from the criminology literature). Among these models, the model in

Dechow et al. [2011] is generally regarded as the most comprehensive fraud

prediction model in accounting literature. Accordingly, we adopt a similar

logistic regression model as our ﬁrst benchmark model (referred to as the

Dechow et al. model). The second benchmark model is a fraud prediction

model developed by Cecchini et al. [2010] based on a more advanced ma-

chine learning method (hereafter referred to as the Cecchini et al. model).

Rather than using the ﬁnancial ratios identiﬁed by human experts alone,

Cecchini et al. [2010] develop a new fraud prediction model based on sup-

port vector machines (SVM) with a ﬁnancial kernel that maps raw ﬁnancial

data into a broader set of ratios within the same year and changes in ra-

tios across different years. Cecchini et al. [2010] ﬁnd that the SVM with

a ﬁnancial kernel outperforms the traditional fraud prediction models in

accounting, including the Dechow et al. model.2

Our proposed fraud prediction model differs from both of these bench-

mark models in two key ways. First, we use ensemble learning, a state-of-the-

art machine learning paradigm, to predict fraud. Most prior fraud predic-

tion research in accounting uses the logistic regression (see Dechow et al.

[2011] for a review). Although ensemble learning has been successfully

1See the online appendix for a more detailed discussion on the differences between causal

inference and prediction.

2It is important to note that the performance results of our Dechow et al. model and Cec-

chini et al. model are not directly comparable to those of Dechow et al. [2011] and Cecchini

et al. [2010] because of a few crucial research design differences, explained in section 3.

To continue reading

Request your trial

Subscribers can access the reported version of this case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the cited cases and legislation of a document.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a list of all the documents that have cited the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the revised versions of legislation with amendments.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see any amendments made to the case.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see a visualisation of a case and its relationships to other cases. An alternative to lists of cases, the Precedent Map makes it easier to establish which ones may be of most relevance to your research and prioritise further reading. You also get a useful overview of how the case was received.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Subscribers are able to see the list of results connected to your document through the topics and citations Vincent found.

You can sign up for a trial and make the most of our service including these benefits.

Request your trial

Why Sign-up to vLex?

Over 100 Countries

Search over 120 million documents from over 100 countries including primary and secondary collections of legislation, case law, regulations, practical law, news, forms and contracts, books, journals, and more.
Thousands of Data Sources

Updated daily, vLex brings together legal information from over 750 publishing partners, providing access to over 2,500 legal and news sources from the world’s leading publishers.
Find What You Need, Quickly

Advanced A.I. technology developed exclusively by vLex editorially enriches legal information to make it accessible, with instant translation into 14 languages for enhanced discoverability and comparative research.
Over 2 million registered users

Founded over 20 years ago, vLex provides a first-class and comprehensive service for lawyers, law firms, government departments, and law schools around the world.

Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users

You can sign up for a trial and make the most of our service including these benefits.

Why Sign-up to vLex?

Over 100 Countries

Thousands of Data Sources

Find What You Need, Quickly

Over 2 million registered users