Identifying and treating outliers in finance

AuthorVincenzo Verardi,Sattar Mansi,Darren Hayunga,David Reeb,John Adams
Date01 June 2019
DOIhttp://doi.org/10.1111/fima.12269
Published date01 June 2019
DOI: 10.1111/fima.12269
ORIGINAL ARTICLE
Identifying and treating outliers in finance
John Adams1Darren Hayunga2Sattar Mansi3David Reeb4
Vincenzo Verardi5
1Department of Finance and Real Estate,
University of Texasat Arlington, Arlington, TX,
USA
2Department of Insurance,Legal Studies, and
Real Estate at the University of Georgia, Athens,
GA, USA
3Department of Finance, Law,and Insurance,
Virginia Polytechnic Institute and State
University, Blacksburg, VA,USA
4Departments of Accounting and Finance,
National University of Singapore, Singapore
5FNRS Department of Economics, Universitéde
Namur, Namur, Belgium
Correspondence
JohnAdams, Department of Finance and Real
Estate,University of Texasat Arlington, 701
S.West Street, Arlington, TX 76019, USA.
Email:jcadams@uta.edu
Abstract
Outliers represent a fundamental challenge in the empirical finance
research. We investigate whether the routine techniques used in
finance research to identify and treat outliers are appropriate for
the data structures we observe in practice. Specifically, we propose
a multivariate identification strategy that can effectively detect out-
liers. We also introduce an estimator that minimizes the bias out-
liers caused in both cross-sectional and panel regressions and pro-
vide outlier mitigation guidance. Using replications of four recently
published studies in premier finance journals, we show how adjust-
ing for multivariate outliers can lead to significantly different results.
1INTRODUCTION
Outliers represent a persistent concern in the empirical finance research. We are all aware that outliers, or observa-
tions that deviate markedly from the data, potentially lead to biased coefficient estimates in least-square regressions
(Edgeworth, 1887).1Researchers often seek to identify these potential outliers by examining descriptive statistics
regarding the variables of interest (Dittmar & Duchin, 2016) effectively examiningobservations three standard devia-
tions from the mean. After identifying these influential observations, the econometrician typically relies on mitigation
techniques to remedy this outlier problem (Henry & Koski, 2017). A review of recent articles that identify outliers in
prominent finance journals indicates that almost all studies rely on univariate identification.2Table1 indicates that
the vast majority of these studies winsorize the data or perform some sort of listwise deletion. Yet,this identification
and treatment of outliers implicitly relies on outliers arising in a univariate context.
c
2019 Financial Management Association International
1Whiledefinitions vary, outliers describe observations that deviate so much from other observations as to arouse suspicions about the mechanism generating
thedata (Hawkins, 1980). We use the term bias to mean the difference between an estimator's expected value and the value of the parameter as determined
bythe bulk of the data.
2Weexamine articles in the Journal of Finance, the Journal of Financial Economics, Review of Financial Studies, and the Journal of Financial and Quantitative
Analysis.We find that 66% use OLS as the primary statistical technique.
Financial Management. 2019;48:345–384. wileyonlinelibrary.com/journal/fima 345
346 ADAMS ETAL.
TABL E 1 Incidence of articles in historical finance journals with outlier mention and treatments
Panel A. Outlier treatmentsin articles using OLS
Year %Winsorize %Trim %Drop
% Winsorize,
trim, and/ordrop
% All other
treatments
2008 35 11 31 75 38
2009 46 14 21 80 25
2010 41 24 10 75 29
2011 53 12 12 78 29
2012 64 20 6 91 21
2013 54 15 39 109 11
2014 35 13 30 78 35
2015 69 717 93 14
2016 56 26 21 103 12
2017 72 10 890 5
Average5216178524
Panel B. Incidence of articles using OLS and mentioning outliers
Year
All papers in JF,
JFE, RFS, JFQA
% All papers
mentioning
outliers
All papers
utilizing OLS
% All papers
utilizing OLS
% OLS papers
mentioning
outliers
1988 194 7% 64 33% 17%
1989 209 5% 71 34% 13%
1990 228 7% 89 39% 15%
1991 195 5% 58 30% 10%
1992 201 6% 66 33% 15%
1993 207 7% 77 37% 9%
1994 182 13% 68 37% 21%
1995 201 12% 58 29% 28%
1996 199 14% 86 43% 26%
1997 223 11% 106 48% 19%
1998 201 14% 85 42% 28%
1999 208 8% 96 46% 23%
2000 216 9% 90 42% 23%
2001 220 16% 102 46% 26%
2002 243 13% 110 45% 25%
2003 241 18% 118 49% 34%
2004 250 15% 105 42% 28%
2005 248 23% 137 55% 35%
2006 259 24% 163 63% 32%
2007 288 23% 185 64% 30%
2008 298 27% 200 67% 37%
2009 379 25% 252 66% 35%
2010 381 24% 234 61% 38%
(Continues)
ADAMS ETAL.347
TABL E 1 (Continued)
Panel B. Incidence of articles using OLS and mentioning outliers
Year
All papers in JF,
JFE, RFS, JFQA
% All papers
mentioning
outliers
All papers
utilizing OLS
% All papers
utilizing OLS
% OLS papers
mentioning
outliers
2011 400 26% 279 70% 34%
2012 365 27% 231 63% 27%
2013 364 30% 178 49% 44%
2014 316 28% 152 48% 33%
2015 328 30% 137 42% 36%
2016 355 31% 163 46% 36%
2017 386 32% 195 51% 37%
Notes: This table provides the number and percentage of articles published each year in the historical finance journals[Journal
of Finance (JF), Journal of Financial Economics (JFE), Review of Financial Studies (RFS), and the Journal of Financial and Quantitative
Analysis (JFQA)]. Panel A reports the outlier mitigation methods used in the historical finance journal articles from 2008 to
2017 using hand collection. Percentages total more than 100% due to multiple treatments in some papers. Panel B presents
the incidences of articles with outlier mention, with OLS mention, and with OLS and outlier mentions from 1988 to 2017 using
keywordsearches. The data are from the EBSCO database for JF, RFS, and JFQA and the Science Direct database for JFE.
We ask a fundamental, but simple question. Are the techniques we commonly use in finance to identify and
treat outliers appropriate for the data structures we observe in practice? Many of the methods to identify and treat
outliers, such as winsorizing, trimming, or dropping the affected observations, arose in a period with limited data sets
and computer power. By necessity,these methods focused on identifying and treating outliers in a univariate setting,
but studies in finance almost always require multivariate analysis. A simple example provides a useful illustration.
Table2 displays a small data set, where descriptive statistics indicate that none of the observations contain univari-
ate outliers. Yet, two of the observations include outliers in a multivariate setting, dramatically influencing thecoef-
ficient estimates in an ordinary least square (OLS) regression framework.3Intuitively,if multivariate (i.e., regression)
outliers arise in a nonrandom fashion, trimming and dropping potentially introduces sample selection problems and
biased coefficient estimates (Heckman, 1979). Table2 demonstrates that neither winsorizing nor trimming mitigates
the influence of the multivariate outliers. Instead, these univariate outlier mitigation strategies actually exacerbate
the multivariate outlier problem (our example is consistent with Bollinger &Chandra, 2005). The example provided in
Table2 clearly demonstrates that despite being the best linear unbiased estimator of the conditional expectation func-
tion from a purely statistical standpoint, naively using OLS can lead to incorrect economic inferences when there are
multivariate outliers in the data.
Outliers arise in a variety of ways including data errors, variable construction, omitted variables, sampling errors,
nonnormality, or chance. Outliers can also be the most important data in a sample when they reflect some unusual
fact that will lead to an improvement in economic theory or model specification (Zellner,1981). Therefore, identifying
multivariate outliers is a key step in evaluating their impact in empirical finance research. Traditionalmethods, such
as studentized residuals or Cook's D, while simple to implement and easy to evaluate, suffer from a masking problem
that occurs when specifying too few outliers in the test. For example, if we are testing for a single outlier when there
are, in fact, two (or more) outliers, these additional outliers may influence the value of the test statistic enough so
that no points are identified as outliers. Traditionalmethods also rely heavily on their assumptions of normality. In this
paper,we propose an identification strategy using outlier robust estimation as in Rousseeuw and van Zomeren (1990).
We find this method effectively identifies the outliers and tests for their influence. We further propose an outlier
3Toillustrate with a more real-world example, consider a panel data of Minnesota employees containing information on naturalhair color, height, weight, eye
color,and ethnicity. In this sample, neither a 5’2” person nor an employee with blues eyes or an employee with blond hair would likely register as univariate
outliers.Similarly, neither an observation regarding a Chinese male employee nor an employee weighing 235 pounds would appear as outliers. However,ifall
ofthese characteristics describe a single employee, then we might suspect this observation is an outlier.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT