Robust estimates of insurance misrepresentation through kernel quantile regression mixtures

Published date01 September 2021
AuthorHong Li,Qifan Song,Jianxi Su
Date01 September 2021
DOIhttp://doi.org/10.1111/jori.12358
J Risk Insur. 2021;88:625663. wileyonlinelibrary.com/journal/jori
|
625
Received: 15 September 2020
|
Revised: 13 May 2021
|
Accepted: 28 June 2021
DOI: 10.1111/jori.12358
ORIGINAL ARTICLE
Robust estimates of insurance
misrepresentation through kernel quantile
regression mixtures
Hong Li
1
|Qifan Song
2
|Jianxi Su
2
1
Department of Economics and Finance,
University of Guelph, Guelph, Ontario,
Canada
2
Department of Statistics, Purdue
University, West Lafayette, Indiana, USA
Correspondence
Hong Li, Department of Economics and
Finance, University of Guelph, 50 Stone
Road East, Guelph, ON N1G 2W1,
Canada.
Email: lihong@uoguelph.ca
Funding information
National Science Foundation,
Grant/Award Number: DMS1811812;
Natural Sciences and Engineering
Research Council of Canada,
Grant/Award Numbers: DGECR2020
00347, RGPIN202005387
Abstract
This paper pertains to a class of nonparametric meth-
ods for studying the misrepresentation issue in in-
surance applications. For this purpose, mixture models
based on quantile regression in reproducing kernel
Hilbert spaces are employed. Compared with the ex-
isting parametric approaches, the proposed framework
features a more flexible statistics structure which could
alleviate the risk of model misspecification, and is in
the meantime more robust to outliers in the data. The
proposed framework can not only estimate the pre-
valence of misrepresentation in the data, but also help
identify the most suspicious individuals for the vali-
dation purpose. Through embedding stateoftheart
machine learning techniques, we present a novel sta-
tistics procedure to efficiently estimate the proposed
misrepresentation model in the presence of massive
data. The proposed methodology is applied to study the
Medical Expenditure Panel Survey data, and a sig-
nificant degree of misrepresentation activity is found
on the selfreported insurance status.
KEYWORDS
big data, insurance claim models, misrepresenter identification,
misrepresentation risk assessment, nonparametric regression
mixtures
© 2021 American Risk and Insurance Association
1|INTRODUCTION
Insurance companies collect policyholders' informationsummarized by the socalled in-
surance rating factorsto calculate riskadjusted premiums. In the real world practice, it often
happens that policyholders intentionally make untrue statements on some key rating factors so
as to alter the insurance eligibility and/or lower the insurance premiums. In actuarial parlance,
this type of fraudulent behaviors are best referred to as the insurance misrepresentation. Rating
factors subject to misrepresentation are typically selfreported. Examples include the smoking
status in health insurance, the millage, and use of vehicles in auto insurance. Misrepresentation
phenomena can be also found in insurancerelated survey data when revealing a particular
level of true information is associated with a type of costs, such as social desirability or financial
cost. Shutting eyes to misrepresentation activities may degrade the quality of actuarial models,
leading to infelicitous business decisions and/or unfair premium schemes. Misrepresentation
risk is also of particular interest to insurance regulators who attempt to understand how the
presence of fraudulent behaviors by a group of insured individuals might affect the welfare of
the others (Gabaldón, 2014). Hence, assessing and managing the misrepresentation risk is an
indispensable component in modern insurance practice.
There are two strands of literature in studying the misrepresentation risk. The first strand of
literature focuses on the qualitative research of misrepresentation, attempting to propose su-
perior policy designs and proactive overseeing processes for deterring the misrepresentation
behaviors (see, Hamilton, 2009, for a comprehensive review). The other strand of literature
focuses on the quantitative aspect of misrepresentation management and aims to quantify the
extent of misrepresentation risk using statistical models. Our paper sits squarely in the latter
strand.
Arguably, modeling insurance misrepresentation is a challenging task. What complicates
the task, from the statistical standpoint, is the unobservable nature of fraudulent activities.
Namely, knowledge of whether misrepresentation actually occurred cannot be discovered until
a formal investigation is undertaken. Thereby, traditional statistics methods (e.g., discriminant
analysis and logistic regression) which require access to a sample frame containing the random
variable of concern (i.e., the misrepresentation status under the individual level), may not be
directly used to study misrepresentation. Consequently, a new set of statistics tools are natu-
rally called upon. In particular, we have a keen interest in developing a rigorous statistics
framework to deliver scientifically sound answers to the following two questions, which are of
fundamental importance in quantitative misrepresentation risk management:
Q1.Based on a given set of insurance claims data, how to assess the level of misrepresentation
activities in the data, before actually observing the fraudulent behaviors?
Q2.If a significant level of misrepresentation activities are discovered, how to select the most
suspicious individuals for the validation purpose?
Despite the practical importance, modeling misrepresentation has not received much at-
tention from insurance researchers until recently, and thus the related literature is considerably
limited. In particular, among the existing misrepresentation models, parametric regressions are
always used to model the insurance claim records. Misspecifying the parametric structure to
the claim data may significantly contaminate the accuracy of the misrepresentation assessment.
In this paper, we aim to advance the present parametric misrepresentation models to a non-
parametric framework via deploying kernel quantile regression (KQR). The research problem is
626
|
LI ET AL.
of great relevance to the insurance practice in its own right. In addition to the methodological
contribution, our careful numerical investigation reveals that improper use of parametric
misrepresentation models may seriously misestimate the misrepresentation behavior. In the
simulation examples, we find that misspecified parametric misrepresentation models may
underestimate the true misrepresentation rate by 90% and also cause excessively large error
rates in the misrepresenter identification process. Furthermore, in a real data application based
on the Medical Expenditure Panel Survey (MEPS) data, we identify a 50% difference in the
estimated misrepresentation rate by using the proposed nonparametric model compared with
the parametric result reported in the earlier literature. Thereby, a practical message which we
aim to convey to the insurance analysts is that, when modeling insurance misrepresentation,
parametric models must be used with extra cautions. If there is no prior knowledge about the
true regression structure of the data generating process (DGP), then the insurance analysts
should at least try the nonparametric framework proposed in this current paper.
The rest of this paper proceeds as follows. Starting off by a literature review in Section 2,we
will argue that there are two major limitations inherent in the parametric misrepresentation
models. Then in Section 3, we introduce a nonparametric alternative to address the issues. In
Section 4, an innovative learning algorithm is suggested to implement the proposed mis-
representation model. The algorithm not only finds nonparametric estimates of mis-
representation probability (which answers question Q1) as well as the regression structure
between insurance claims and relevant covariates, but also helps identify potential mis-
representers (which answers question Q2). Based on extensive simulation studies in Section 5,
we show that the proposed framework is able to accurately estimate the misrepresentation
probability under various DGP's, ranging from the univariate linear models to multivariate
nonlinear models. Moreover, misrepresenters are successfully detected with very low error
rates. Therefore, the proposed misrepresentation framework seems to be able to deliver a
satisfactory performance under different challenging situations. Moreover, another numerical
example based on a synthetic nonlife insurance portfolio is included in Section 6.1 to illustrate
the big data application of the proposed model. Finally, in Section 6.2 the proposed model is
applied to study the MEPS data. In line with the results of Akakpo et al. (2019), our model
suggests a significant percentage of respondents misrepresented on the selfreported insurance
status in 2014, to avoid the potential tax penalty. However, our estimated percentage is sub-
stantially lower than that reported in Akakpo et al. (2019). This discrepancy may be attributed
to the fact that the proposed KQR method could better cope with the nonlinear dependence
structure inherent in the MEPS data, and thus leads to a more reliable estimation of the
misrepresentation probability.
2|PRESENT LITERATURE AND MOTIVATION
To the best of our knowledge, there are only a few existing works on misrepresentation
modeling, including Akakpo et al. (2019), Xia (2018), Xia and Gustafson (2016,2018), and Xia
et al. (2018). In particular, one of the most recent attempts made in Akakpo et al. (2019)
significantly inspires our undertakings. In that paper, the authors built on the theoretical
groundwork established by Xia and Gustafson (2016) and proposed a mixture of parametric
regression to model insurance misrepresentation. Before placing the present paper into per-
spective, the following paragraphs give a coarse overview of the misrepresentation model in
Akakpo et al. (2019). Some standard notations for describing the misrepresentation problem of
LI ET AL.
|
627

Get this document and AI-powered insights with a free trial of vLex and Vincent AI

Get Started for Free

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT