An analysis of 5.3 million housing sales suggests that there are fundamental shortcomings with how automated valuation model (AVM) vendors currently calculate their AVM performance metrics, in particular the forecast standard deviation. The analysis demonstrates that the methodology used to calculate the values of performance metrics meaningfully impacts an AVM's credibility. This article proposes consistent methodologies to calculate AVM performance metrics that comply with well-established appraisal principles and allow a consistent evaluation and comparison of AVM performance. A case study's research AVM empirically illustrates that not following these principles yields overly optimistic AVM performance metric values.
An automated valuation model (AVM) (1) is a computer software program that produces an estimate of the market value, called an AVM valuation, of a subject property given (1) the address of the subject property, and (2) property sales and characteristics data. AVM vendors blend many property transactions, acquired from public sources or data aggregators, with one or more valuation models, acquired from academic and professional publications or developed by their own analysts, into a product called an AVM, the details of which are a closely guarded trade secret. An AVM produces a valuation along with certain statistics, called AVM performance metrics, that assess the validity, accuracy, and precision of the AVM valuation. The focus of this article is AVM performance metrics.
Two recent events have made evaluation of overall AVM performance increasingly important. First, the Interagency Appraisal and Evaluation Guidelines require, among other things, that lending institutions independently assess the reliability of the AVMs they use. (2) Second, the Federal Deposit Insurance Corporation (FDIC), the Board of Governors of the Federal Reserve System, and the Office of the Comptroller of the Currency, have jointly increased the de minimis threshold, from $250,000 to $400,000, for residential real estate transactions that do not require an appraisal with a physical inspection of the property and neighborhood. (3) As a result, lenders will be allowed to make more residential mortgages secured by properties that are valued utilizing an AVM rather than a traditional appraisal.
Due to the proprietary intellectual property contained within an AVM, assessing AVM credibility, i.e., its validity, accuracy, and precision, is accomplished through an examination of the AVM's performance metrics. (4) Typically, users of AVMs are dependent upon AVM vendors to provide reliable performance metrics, for example, the forecast standard deviation (FSD). (5) However, as Kane, Linne, and Johnson state, "Third-party verification is critical." (6) These third parties, including credit rating agencies (such as Fitch, Standard and Poor's, and Moody's) and independent AVM testing firms (such as AVMetrics), assess AVM reliability using performance metrics.
The purpose of this study is first to demonstrate that the calculation of performance metrics is not standardized across the AVM industry or AVM vendors, which can result in AVM vendors underreporting their FSDs. Second, five best-practice principles are recommended for AVMs, and a supporting statistical procedure is presented to implement these principles. The discussion explains how these steps would bring AVMs into better alignment with current appraisal practices. Moreover, if these principles are respected, then the values of the performance metrics associated with any model would be directly comparable to those of another model. The case study demonstrates that not following the valuation principles can result in an overly optimistic assessment of an AVM's performance. Consequently, it is recommended that AVM vendors adopt the valuation principles and that users of AVMs request conformity with these principles.
Review of the Literature
Most of the literature regarding AVM performance metrics appears in unpublished manuscripts, (7) self-published books, (8) industry websites, (9) or recent trade publications. (10) Exhibit 1 contains a list of common performance metrics, along with a glossary of abbreviations and definitions related to AVM performance metrics. For example, Gayler et al. recognize mean percentage sales error, mean absolute percentage sales error, FSD, and hit rate as important metrics for the evaluation of the performance of an AVM. (11) The Collateral Risk Management Consortium suggests using percentage sales errors, mean percentage sales error, and error buckets to assess AVMs. (12) CoreLogic recommends evaluating AVMs using the mean percentage sales error, median percentage sales error, FSD, and error buckets. (13) AVMetrics advocates that no more than 10% of AVM valuations should be more than 20% higher than their corresponding selling prices, suggesting a right tail 20% performance metric. (14) Kirchmeyer and Staas state that median absolute percentage (sales) errors (MAPEs) of less than 10% "are indicative of a strong AVM, while those ranging from 11% to 15% might also be acceptable for some lending programs." (15)
Error buckets, also called percent (predicted) error (PE) buckets, count the number of sales that are deemed accurate (i.e., the success rate of the AVM) at a given level of precision, typically [+ or -] 5%, 10%, 15%, and 20%. (16) In the study presented in this article, the notation PExx is used to refer to a specific error bucket, at a given [+ or -] (xx) percentage. For example, PE10 represents the [+ or -] 10% error bucket. Kirchmeyer originally suggested a success rate that at least 50% of AVM valuations should be within [+ or -] 10% of selling prices. (17) That is, the (percentage) success rate of an AVM at PE10 should be at least 50%. More recently, the Mortgage Bankers Association reported that "[a]lmost all counties in the United States experience [PE10] rates north of 70 percent," (18) suggesting a success rate at PE10 of 70% or more.
An AVM's failure rate in a given error bucket is the complement of the AVM's success rate within that error bucket. The failure rate is a concept common in engineering, where it is defined as the frequency with which a component fails. (19) The failure rate concept is also found in other fields where the process fails to perform well, such as the percent of small business failures, (20) the percent of students failing a computer programing course, (21) hotel failures, (22) and commercial banks insolvencies. (23) In the study presented here, which focuses on sales where the AVM fails to accurately predict selling prices, the failure rate of an AVM in a particular error bucket (e.g., PE10) is defined as the frequency (percentage) with which an AVM fails to predict the value of a target property within the tolerance given by the error bucket (e.g., [+ or -] 10%). (24)
In addition, AVM vendors typically provide a confidence score, "which is often interpreted as meaning that the AVM estimate is within plus or minus 10% of the 'true' market value of the property with a high degree of confidence." (25) However, the definition and use of a confidence score are not standardized across AVM vendors. (26) For example, Veros describes its confidence score as a measure of accuracy between zero and 100 for which each decile generally corresponds to a 5% variance. (27) Realtors Property Resources uses an RVM confidence score of zero to five stars. (28) CoreLogic's PASS produces a confidence score between 60 and 100 that measures how well "sales data, property information, and comparable sales support the property valuation process." (29) Gordon states that a confidence score may or may not be related to the FSD and that "[s]uch a confusion of [confidence] scores and lack of connection to statistical performance in actual use forces lenders to guess at their risk management." (30)
For each individual target property being valued, AVM vendors may also report the target property's FSD. (31) Gayler et al. define an FSD as "the standard deviation of the percentage error, where the percentage error describes the relative difference between [AVM] valuation and price." (32) Freddie Mac qualifies the value of the FSD generated from its Home Value Explorer (HVE) AVM as high, medium, or low confidence. "High confidence" requires an FSD of 13 or less. "Medium confidence" arises from an FSD between 13 and 20, while "low confidence" occurs for valuations with an FSD greater than 20. (33)
Reporting of the FSD by AVM providers is ubiquitous; however, the FSD description is not standardized across the industry. For example, CoreLogic states that "[t]he FSD is a statistic that measures the likely range or dispersion an AVM estimate will fall within, based on the consistency of the information available to the AVM at the time of estimation." (34) Matysiak writes that the FSD is an "estimate of the amount of variation that can occur between the actual sales price and the forecast (the most probable market value) made by the AVM." (35) Gordon offers another definition, describing the FSD as "an AVM value's expected (forecasted) proportional standard deviation around actual subsequent sales price for the given property value estimate." (36)
The clearest mathematical definition of the FSD is that it is the standard deviation of the percentage sales errors for a collection of valuations. (37) However, the method of calculating an FSD for an individual target property is not consistent, meaning that it is not clear how an AVM provider is using the sampling distribution and/or parsing a data set to provide a unique FSD value for any one particular target property.
An AVM report typically contains a high/ low range of value based on a [+ or -] 1 x FSD confidence interval around the AVM valuation. (38) This 1 x FSD interval is often interpreted by assuming that the underlying sales errors are normally...