Why Are Polls Destined To Be Wrong?
Spoiler: the polls are going to be wrong again. Here's why.
No matter what happens tomorrow, the polls are very likely going to be “wrong”.
The current RCP national average of Trump +0.1 will likely miss one way or another by enough to inspire at least a little mocking and derision. And if it misses low, then most (maybe all) of the RCP swing state averages are likely to miss low as well. If this happens, derision will give way outrage about how the polls underestimated Trump again. Many will conclude Trump supporters simply respond to polls at a lower rate and even diligent pollsters are powerless to adjust for it. Others will wonder if the polls are mostly conducted by anti-Trump organizations that underestimate Trump’s chances on purpose.
On the other hand, if the RCP average misses high, meaning Harris wins the popular vote and/or outperforms in battleground states, then she will very likely outperform across the board. If this happens, the inevitable narrative will be that the polls “over-adjusted” for the misses in 2016 and 2020. But would that necessarily be the case? Or might there now be a silent Harris voter, as some have speculated?
None of the above.
Maybe the reason behind this predictable phenomenon has nothing to do with partisan bias, hidden voters, or inherent flaws of polling, and instead just reflects the fundamental mechanics underlying how polls work and the factors that drive electoral outcomes.
To understand this, note that there are three general ways a campaign or candidate can secure an advantage in an election, relative to a prior election or other baseline:
Pull votes from the other side. This occurs when voters who voted for Biden in 2020 switch their votes to Trump in 2024, or Trump 2020 voters switch to Harris in 2024. As of right now, about 5-6% of Biden 2020 voters are moving to Trump in 2024, while 2-3% of Trump 2020 voters are moving to Harris, a net 3% gain for Trump.
Motivate more of your own supporters to vote. Self-explanatory.
Deter more of your opponent’s supporters from voting. Ditto.
It turns out that polls are often good at estimating (1) but not so good at estimating (2) or (3). That’s because polls are designed to measure relative numbers, i.e., percentage of support each candidate receives among a sampled group of likely voters, and not absolute numbers, i.e., total number of voters in each group. The former is related to vote share, while the latter is related to turnout.
Voter turnout is influenced by a complex web of factors that many polls don’t even try to measure (but still typically make assumptions about, if only implicitly). These factors can include voter enthusiasm, logistical challenges (e.g. weather and wait times), and other external events close to election day. To compensate for potential sampling biases, pollsters usually reweight their data based on demographic and turnout models derived from historical elections. This reweighting process often assumes that the composition of the electorate will resemble that of recent elections. For the most part, this estimate is a good starting point. But it fails to capture the fact that the side with more motivated voters will often see big gains in turnout, the so-called “enthusiasm gap”.
So while reweighting improves polling accuracy overall, it also exposes a vulnerability of polls, in that the reweighting assumes a baseline for each category that might not capture shifts in voter behavior that are unique to the current election cycle. This means that if turnout or engagement disproportionately changes for one party due to a motivating event or trend, polls may fail to anticipate this change. It happened in both 2016 and 2020, and we’re likely to see it again in 2024, not because polls are missing the “silent Trump” or “silent Harris” voter, and not because polls are intentionally biased in one direction or another. The reason is much simpler: the factors driving outperformance in one state tend to drive similar outperformance not just in other states within that region, but around the country. So if the polls significantly underestimate Trump (or Harris) in one state, they are likely to systematically underestimate Trump (or Harris) nationwide.
Correlation of State-Level Outcomes
One of the most critical insights in understanding why polling errors tend to move in the same direction across states is the correlation of motivating factors. Unusual economic conditions, a major Supreme Court decision, or even a candidate’s charisma can energize or demoralize one side or the other’s voters nationwide.
Thus, when polls broadly misjudge turnout (especially when it’s due to failing to incorporate a new or cycle-specific motivating factor), they tend to miss in a way that affects most or all polls in the same direction. This produces what appears to be a systematic partisan bias. But importantly, this will for the most part be due neither to deliberate bias, or polling error per se, but is rather an unavoidable consequence of relying on historical data to predict future outcomes, magnified by the inherently shared partisan-aligned motivational factors that carry across states and regions.
Case study: Systematic error in recent elections
To illustrate how the above has played out in the past, below are the RealClearPolitics final averages (and actual results) for the last four presidential elections.
Notice that in 2008 and 2012, the final polling average underestimated Obama in all but a few states; in 2012, it got worse, with Obama underestimated everywhere except Arizona. In the Trump era, though 2016 is often referenced as a major polling failure, the misses are a bit more evenly distributed in terms of overestimating and underestimating Clinton’s support (and the average absolute size of the miss was comparable to the three other cycles). 2016 stands apart mainly because the important misses (PA, MI, WI) were all in Trump’s favor and large enough to upturn the expected winner of those states and therefore the presidency. In 2020, the polls underestimated Trump even more consistently (though it resulted in fewer state-level upsets).
When this pattern unfolds (as it likely will again), it does not necessarily indicate systematic methodological mistakes by, nor partisan or other bias among pollsters. Polling “error” stems not just from sampling error (the only category measured by the much-cited “margin of error”), but from a failure to recognize and model in real-time potentially large asymmetric, cycle-specific motivating factors.
And because this source of error is so highly correlated across states and even regions (much more so than sampling error or other methodological factors), it is likely that the majority of polls will continue miss in a single direction, by overestimating or underestimating the same candidate.
Don’t be surprised when the same thing happens again in 2024 (and 2028 and 2032…).
2020 Georgia actual figure is wrong