The Wages of Election Modeling: Silver vs. the 13 Keys

Putting forecasters to the test by simulating the gains (or losses) you'd make trading their recommendations on Polymarket every day

Nov 18, 2024

There’s been a bit of image rehab and goalpost nudging among the legacy forecasting set since election day. From ex post pivoting from models to modals, to blaming distortive (but apparently unforeseeable) “misogyny, racism, xenophobia, [and] antisemitism.”

Whatever tapestry of unmodelable forces conspired to cause a few of the most celebrated centralized soothsayers to misfire this cycle, it’s worth taking a moment to dispassionately assess which were better (or less bad) than others.

After all, the one thing black box forecasters can agree on is that they’re sharper oddsmakers than the dumb-money-soaked, inumerate echo chambers known as prediction markets.

So we’ve set aside a cool $10,000 in pretend, retroactive cash for each major forecaster to use trading its own predictions against Polymarket’s presidential winner market (the most liquid real-money prediction market with the narrowest spreads) to see who can rack up the biggest gains.

Scroll to the bottom for detailed notes on our simulation methodology, but here’s the short version…

The Rules

Forecasters start with $10,000.
Once a day, the forecasters get to buy or sell/short Trump shares, depending on whether their own odds are higher or lower than the Polymarket price.
The size of each trade is driven by the size of their bankroll and how certain they are of the mispricing - the bigger the mismatch, the bigger the trade.
We mark the holdings to market each day, so unrealized gains are reflected in daily account value.
A total of 48 trades are made, from September 19 through November 5. Any realized gains booked along the way are theirs too keep. After the election, all Trump shares pay $1 and Harris shares become worthless.

We’ll do a handful of these one-on-one comparisons, based on thematic or simply amusing matchups.

First up…

Nate Silver’s Silver Bulletin vs. Allan Lichtman’s 13 Keys To the White House.

TL;DR They both get wrecked, but Lichtman way worse, who goes for broke - and quickly gets there.

Below, we dive into the daily performance metrics - how the respective forecasters’ odds differed from market prices each day, how big a daily bet they place on Trump or Harris as a result, and how their portfolio value grew/shrank over time.

Next match-up…

Fine print re simulation methodology: We’re using a modified version of fractional (5%) Kelly betting. (Any more aggressive than 5% and the forecaster burns through the cash too quickly and overly weights his early predictions.) The Kelly criterion is a formula that informs sizing of a series of binary bets in order to maximize long-term expected growth. With apologies to Dr. Kelly, it’s not entirely apt here, since these bets are not (even close to) independent events; in fact, they’re all the same event, just with 48 different entry points. But it still serves to prescribe a daily bet direction and size that corresponds to the degree of certainty the trader/forecaster has about the market’s mispricing.

For Nate Silver, since we have the time stamp for most of his model updates, whenever possible, we place the trade immediately after his daily update, such that any impact his model has on the market is (hopefully) not yet fully priced in. This also serves to simulate the pricing a relatively fleet-fingered user might be able to get if trading off Silver’s insights. Since Lichtman never adjusted his 100% probabilistic forecast, there was less risk of any interim pronouncements moving prices, so we made all of his trades simultaneous to Silver’s.

For the bankroll variable in the Kelly equation, we use the total account value (cash + mark-to-market value of the net Trump/Harris position). Using just the daily cash balance could lead to situations where a trader should be motivated by dramatically shifting odds/prices to take a large offsetting position (and can afford to do so, since it wouldn’t require additional cash), but is prescribed a much smaller offsetting position because their nominal cash value has become so small.

Bid/ask spreads are assumed to be $0.001 centered around the observed historical midpoints and we assume enough liquidity at the best price to accommodate whatever trade size the forecaster desires.

Discussion about this post

Ready for more?