One of the most important decisions you face as a forecaster is simply when to publish a statistical model for public consumption. If you’re just running a model for your personal edification — or to make bets with — the threshold may actually be lower. If you’re evaluating the impact of a player injury on an NFL or NBA game that you’re considering betting on, for instance, then you might only get a couple of minutes before some reasonably rational assessment of the impact has already been priced into prevailing betting lines. Under these circumstances, a good first-pass estimate can go a long way. By the time you dot all the ‘i’s and cross all the ‘t’s to incorporate the impact of the injury into a formal model, it may be too late.
When you issue a statistical forecast publicly, though, I think the responsibility is slightly greater. In some cases, probabilistic forecasts can be confusing to people. And in other circumstances, people can take statistical models too seriously and treat them as oracular when in fact all models rely on the researcher’s assumptions. Let’s not get too carried away with this — some assumptions are better than others, which is why some models are better than others. (And putting a model behind a paywall is a pretty useful trick for self-selecting a more knowledgeable reader base.) But there are times when a subjective estimate may be better, especially in unforeseen circumstances that your model wasn’t really designed to handle.
For instance, when Joe Biden dropped out of the presidential race last Sunday, I suppose we could have just done a hot swap and immediately replaced him with Kamala Harris — pollsters have periodically tested the Harris vs. Trump matchup, especially since Biden’s disastrous debate on June 27. But I think this would have misinformed even our smart, self-selected group of Silver Bulletin readers more than it informed them. The polls were already in flux, given Biden’s mounting crisis on top of the assassination attempt against Trump on top of the Republican convention, which is typically a period when polls can produce short-lived bounces. And Harris’s candidacy was still hypothetical, although she was clearly prepared, working behind the scenes to become the Democrats’ presumptive nominee within 24-48 hours.
It’s still only been eight days since Biden quit — a long eight days, but just eight days. However, there’s now been a fair amount of polling since Biden’s decision and Harris’s formal entry into the race. It indeed shows Harris doing better than in the immediate post-RNC/assassination/Biden-crisis period — quite a bit better, actually — although since I put a slightly clickbait-y headline on this newsletter, I should remind our Democratic readers that she still faces a lot of challenges, most notably Democrats’ persistent Electoral College disadvantage.
Still, there’s enough data to get a reasonable baseline for the current state of the race — and although the model is more likely than usual to show further shifts over the course of the next week or so, enough data that it’s worth turning the model back on.
So we’ll do that tomorrow. In fact, we’ll be publishing three stories tomorrow:
The first Harris-Trump version of the model, which we’ll publish at the same URL where the Biden-Trump forecast previously appeared. The Biden-Trump version has been archived here — we’ve totally unlocked the Biden-Trump numbers, in fact, so you get a sense for all the cool charts and data that are available in the paywalled version.
A fresh, model-driven narrative overview of the state of the race.
And third, a short-ish methodological update, outlining what kamala_mode actually entails. Short answer: there are actually very few changes, as we’re trusting the pre-existing model logic as much as possible, although we need to make some adjustments to account for the fact that there wasn’t much Harris-Trump polling until the debate occurred on June 27. We also took the downtime to examine a few other minor things, like the home-state adjustment for presidential and VP candidates. But literally about 99.5 percent of the code is the same.
As before, polling averages will be free for all readers, while probabilities and forward-looking components will be paywalled. We’ve been updating the model 6-7 times a week (i.e. nearly every day) and we’ll plan to continue that. (Eli and I were lying to ourselves when we said we were only going to update the numbers once per week.) Paid subscribers also get a weekly Model Talk column (twice weekly after Labor Day) and a monthly subscriber Q&A — there’s still time to submit questions for this month’s edition, by the way, although I plan to focus this time on questions that aren’t about the election. To sign up, you can use the link below.
And don’t worry — there will still be plenty of free content too, both about the election and other things. Surely the political news cycle has to slow down, right? Although at this rate, I’m halfway expecting to type a sentence like “the model doesn’t yet fully account for the alien invasion on Aug. 11” at some point in the near future.
As long as you aren't having to to type a sentence like “the model doesn’t yet fully account for the Chinese attack on Taiwan” I'll be good.
The temptation for our rivals to pull something big on us at this point must be huge.
I'm triggered. howmanysims should be how_many_sims to be consistent with the naming convention of the other variables. STOP EVERYTHING AND FIX IT NOW