Election forecasting as a field of political science has come of age, doing so with a roar during the 2012 U.S. presidential election contest. The more established approaches are digging deep, while the less established approaches are giving them a good challenge. The synergy has moved the science of election forecasting, not to say the public attention to election forecasting, to a new plane. There are multiple approaches, and the number continues to increase. The leading ones are still opinion polls, trading markets, and statistical models. (For a full contemporary account of such strategies, see the symposium by Lewis-Beck and Stegmaier [2014a].) With respect to statistical models, two basic strategies have emerged: state-level modeling and national-level modeling. Our work follows the latter strategy, and during the 2012 U.S. presidential election campaign we developed proxy models, using them as nowcasts to predict the contest.
In brief, our proxy model sought to forecast the 2012 national result as a function of an empirical surrogate variable of the vote itself. Starting one year before the election, we updated our prediction each month, in a nowcast regularly released on The Monkey Cage, the blog site (http://www.washingtonpost.com/blogs/monkey-cage/). Our final proxy model nowcast, appearing well before the election, and based on data gathered six months before, predicted the election almost exactly at 52.7% (versus the actual 52.0 popular two-party vote share for Obama).
In the article at hand, we begin with discussion of how different forecasting approaches fared, when trying to predict the 2012 U.S. presidential election. Next we develop the proxy model idea, drawing on our earlier examples from other democracies, before applying it to U.S. presidential elections. Once explained, we submit the proxy model for these U.S. elections to various diagnostics, including the out-of-sample forecast of the 2012 contest. Given certain attractions of the proxy model, we attempt to account for why it works. Then, we explore its use in nowcasting, again drawing on our earlier nowcasting work, before testing it against the 2012 election. By way of conclusion, we offer an evaluation of proxy model performance along four traditional criteria--accuracy, lead, parsimony, reproducibility and--a new one--currency. By these standards, the proxy model seems, perhaps deceptively, a simple solution to the complex task of forecasting U.S. presidential elections.
Different Forecasting Approaches: Accuracy in 2012
In the study of U.S. election forecasting, there are different approaches. Several contemporary reviews of this material have appeared (Lewis-Beck and Tien 2011; Lewis-Beck and Stegmaier 2014b; Stegmaier and Norpoth 2013). Setting aside the identification of clever "coincidences" that seem to predict presidential election outcomes, such as how well the Redskins do, or how good the current Beaujolais harvest tastes, there are a variety of more solid approaches, occupying a different place on the scientific continuum. (On "coincidences," see Lewis-Beck and Rice 1992, chap. 2.) With respect to the continuum, at one end are studies of predictive "keys" to election success and the naming of special "bellwether" counties or states that seem to track the presidential outcome such has been alleged most recently for Ohio. (On "keys" and "bellwethers," see Nadeau and Lewis-Beck 2012.)
At the other end of the spectrum are opinion polls, political trading markets, and statistical models, which represent the bulk of the scientific election forecasting enterprise. Opinion polling organizations led, at least until recently, by the Gallup organization, have long provided vote intention surveys that are widely used to predict election outcomes. Many current rivals to Gallup exist, foremostly perhaps Pew for individual surveys and Real Clear Politics for aggregations of these surveys. Political trading markets are mostly represented by Intrade and the Iowa Electronic Market, although the former appears to be dropping out of the game. A major question concerns whether the trading markets offer more accuracy than the polls. (On this point, see especially the work by Erikson and Wlezien 2012.) The statistical modeling approach has a developed tradition within the discipline of political science as well-represented in the American Political Science Association panels that have offered presidential forecasts from different teams of scholars since the 1980s.
Somewhere in this mix go the forecasting efforts of commentators in print media and electronic media, which have recently captured much interest (e.g., Charlie Cook of the Cook Report, Ezra Klein of the Washington Post, Nate Silver formerly at the New York Times). In addition to this journalistic commentary, certain political scientists have made scholarly offerings on blog sites (e.g., John Sides at the Monkey Cage or Drew Linzer at Votamatic (http://votamatic.org/). These media efforts have sharpened differences in modeling approaches, forcing choices between the state and the nation as unit-of-analysis. For example, Klarner (2012) attempts to forecast the presidential outcome by predicting the result in each state, whereas Norpoth and Bednarczuk (2012) attempt to forecast the result from the entire nation.
The American state as unit-of-analysis has standing in the political science modeling of presidential election forecasts, going to the early work of Campbell (1992), Holbrook (1991), and Rosenstone (1983). This state-level approach was all but abandoned, largely because of measurement error issues, but recently has come back strong. Across the period, the national level approach has endured and continues among certain statistical modelers in the structural tradition, despite challenges. (See, e.g., the regularly cited Abramowitz  model.) Looking at the political science modeling teams offering 2012 forecasts at the American Political Science Association meeting, 10 focused on the national level, while three focused on the state level (Campbell 2012, 612). The proxy approach we advocate here emphasizes a new approach to U.S. presidential election forecasting but remains within the modeling framework employing the nation as unit-of-analysis. (Jackman , in a contemporary discussion of the unit-of-analysis problem, gives a "maybe" to the argument that the state approach dominates the nation approach. His argument stems from the strong link between national and state results, based on evidence from the notion of "uniform swing.") We have also addressed the argument that the usual national level forecasts are limited in value because presidential elections are decided by the Electoral College vote, which is derived from the state-level results combined. However, we have also shown that the national popular vote and the electoral college vote are tightly linked, currently r = 0.97 (Lewis-Beck and Rice 1992, 24; Lewis-Beck and Tien 2004, 2012).
How did the competing approaches to election forecasting do in 2012? There are different ways of summarizing the results. Recall that the actual national result, in terms of total vote share, was 51.1% for Obama, 47.2% for Romney, and 1.7% for other candidates. Compare some leading individual polls to those numbers. Gallup called it (November 1-4) for Romney (49% to 48%). Pew also declared (October 31-November 3) for Romney (50% to 47%). However, Real Clear Politics, which regularly averages a number of polls, finally picked Obama (48.8% to 48.1%). Polly Vote, which averaged five poll-averaging organizations (including Real Clear Politics), came up (on November 6) with 50.6% of the two-party popular vote for Obama. By way of contrast, in political stock trading, the Iowa Electronic Market gave a final (November 6) two-party forecast for Obama, at 51.0%.
The statistical modelers, going against the polls and the markets, released their forecasts earlier. In the major example, 13 teams of modelers gave a median forecast of 50.6% of the two-party vote to Obama, issuing that prediction a median of 99 days before the contest itself (Campbell 2012, 612). We thus observe that the modelers, on balance, forecast about as well as the other leading approaches. What distinguishes them from the pack is not their accuracy--which is equivalent--but their lead time. They put forward forecasts well before the election itself, rendering predictions at some distance from the contest, predictions that reflect the substantive meaning of the word "forecast." As we shall see below, strong lead time composes a major feature of our proxy model.
What is a Proxy Model?
In election forecasting, there are two main kinds of statistical models: explanatory and predictive. Virtually...