How do we design a better grading system? We now articulate policy implications of our study that may apply to grading jurisdictions specifically and inspection systems generally.
First, our study underscores the need for transparency about transparency. The availability of rich inspection microdata empowers information intermediaries to rigorously examine how well food safety programs function and to convey that information more persuasively to consumers. (308) As Sam Issacharoff argues, "What is needed is a regulatory regime that would promote a market for intermediaries." (309) The Obama Administration's emphasis on microdata disclosure potentially facilitates such intermediation. (310) Indeed, the brunt of this Article can be considered a form of information intermediation that sheds light on restaurant grades. New York--one of only several major metropolitan areas that makes microdata readily available (see Table 1)--is a model jurisdiction in that sense. All jurisdictions should follow New York's lead and release full health-inspection data in machine-readable form. The disclosure should be comprehensive, including inspector identification codes, specific violations and point scores, types of violations, and data from restaurants that no longer exist. Even New York falls short of this goal, making it much more difficult to comprehensively assess its grading system.
The benefits of wholesale disclosure extend beyond policy evaluation. Wholesale disclosures empower intermediaries to deliver information to consumers in more direct and effective ways. Inspection microdata, for example, would enable Yelp, a website that aggregates information about ratings of local businesses reaching roughly 66 million unique visitors per month, (311) to include health inspection data in its restaurant characteristics. Similarly, the website Scorecard compiles data from over four hundred government and scientific websites to provide environmental information about localities. (312) Disclosure of real property records by state and local government agencies empowers intermediaries like Zillow, a website that uses fine-grained information on 100 million homes, (313) to deliver simplified, useful information, such as local home-value trends that are based on housing-price models, directly to home buyers. Smart phones permit dissemination to the immediate time and place of decisionmaking.
Second, inspection criteria should be simplified to reduce variability across inspectors. The same behavioral insight of simplifying information for consumption should also apply to information generation. New York, for example, could adopt a scoring worksheet closer to San Diego's, which would likely increase consistency across inspections. Ideally, agencies would conduct experiments to choose violation items and to determine the optimal level of inspection worksheet complexity. (314) A complementary approach would be to conduct more frequent, but shorter, inspections of a random subset of violations (weighted by risk). Such an approach might enable more objective measurement because inspectors could focus on a smaller set of more easily measurable violations (e.g., food temperature of three randomly chosen items) and restaurateurs would have little time to clean up during the inspection. Removing inspector discretion by design (i.e., by random selection of objectively measurable indicators) may greatly improve the accuracy of inspection scores. Modern survey measurement relies on the same principle: random sampling of respondents removes surveyors' discretion to choose respondents. (315)
Overly complex criteria appear to undermine inspections in other regulatory fields. As John and Valerie Braithwaite convincingly demonstrate, the complexity and specificity of criteria plague the consistency of nursing home inspections. (316) Similarly, inspections by the Mine Safety and Health Administration (MSHA) and the Nuclear Regulatory Commission (NRC), which have no formalized score sheets (317) despite a large number of possible violations, (318) are subject to sharp criticisms of inconsistency. (319) The Braithwaites argue that simplification in particular promotes consistency by fostering deliberation and a form of peer review among inspectors. (320) Our findings corroborate that simplification on the information-supply side may improve inspections in other regulatory areas.
While our evidence suggests that reforms would reduce the impact of the inspector lottery, the major remaining limitation lies in inspection resources. Without sufficient supervision and training of inspectors, (321) it may not be possible to achieve satisfactory uniformity across inspections. From that perspective, the more difficult policy decision may be whether to increase the budgets and salaries of health departments.
Third, inspections should take place at truly random intervals to eliminate short-term changes taken solely in anticipation of the inspection. (322) A pernicious feature of existing regimes is the relative predictability of when inspections will occur. In San Diego and Los Angeles, restaurants can pay for a next-day reinspection. In New York, the July 2010 reforms spelled out in concrete terms when to expect inspectors--seven days to roughly a month for reinspection, and ninety to 150 days for the next initial inspection for restaurants receiving 28 or more points. (323) Such certainty enables restaurateurs to devote resources to a temporary cleanup in advance of the inspection. Greater randomness would make such strategic cleanups far more difficult. Increasing the randomness in timing of inspections takes real political will, but making inspection scoring more consistent may reduce restaurant hostility toward grades, making such reform more feasible.
Fourth, to battle grade inflation, jurisdictions like San Diego should consider changing the thresholds for letter grades to generate meaningful distinctions. For instance, if San Diego employed a threshold of 95 points to receive an 'A,' consumers would receive more information about the relative risk of establishments. At minimum, the overall proportions of restaurants receiving each grade should be disclosed on the grade placard.
Last, health departments (or information intermediaries armed with more comprehensive data) should apply well-known statistical adjustments for differences across inspectors and inspections. (324) The intuition behind such models is that good scores by tough inspectors are more meaningful than good scores by easy inspectors. Statistical models can adjust for inter-inspector differences so that the numerical score is comparable across restaurants, regardless of what the grade threshold may be. (Insights from such models could also be applied to adjust for the time of the day.) Moreover, any disclosure to consumers should convey uncertainty in the scores. (325) For example, one simple proposal would be to disclose the (model-based) probability that a restaurant would receive an 'A' if inspected on a future day. Such adjustments would appropriately tailor the strength of the disclosure to the consumer by the uncertainty in distinguishing sanitation levels of restaurants. New York's grades aim to cure an information deficit but, if anything, may overcompensate by creating a false sense of certainty.
Beyond these specific design elements, this Article raises profound questions about mandated disclosure and targeted transparency. First, given that the poster child of targeted transparency is itself susceptible to ineffective implementation, this study raises questions about the design of disclosure policies far beyond food safety. It calls into question the design, implementation, and administration of disclosure in a myriad of regulatory areas. (326)
Second, while behaviorally informed regulation is an extraordinarily promising approach, the contextual nature of behavioral effects also makes it difficult to extend findings from one arena to the next. Nudges are contextually dependent. A yellow 'C' grade, for example, may have quite different effects from a red 'C.' New York already had a means of publicly indicating positive sanitation results prior to July 2010--the Golden Apple--but one that apparently did not function effectively. What our findings underscore, then, is the increasingly recognized need to evaluate empirically the efficacy of such design elements, with field experimentation being the most credible assessment tool. (327) Fortunately, the changing evidentiary base of government, combined with the increasing availability of rich microdata about and from administrative agencies, facilitates the systematic assessment, understanding, and, ultimately, improvement of the regulatory state in ways previously unimaginable. (328)
Third, nudges cannot compensate for underlying problems in regulatory design. Slapping a grade onto a score from a faulty inspection system provides the imprimatur of transparency, without a public health basis. If the simplified grade or score is merely a proxy (that is, if it reflects but does not directly measure the concept of interest, namely the risk of foodborne illness), it can be strategically gamed by restaurants and inspectors, thereby losing validity. (329)
Fourth, the broader desirability of grading (and nudging) depends on a normative theory of the regulatory regime. Is the purpose of such systems, for example, to identify sanitation outliers? In that respect, San Diego's system actually performs far better than New York's: a 'B' is truly informative and heightens the expected penalty of noncompliance. Or is the purpose of the system to incentivize restaurants to improve across the board? In that case, we might favor more grade discrimination between restaurants, as in New York. Given fixed resources, however, the latter comes at a considerable cost--a reinspection...