How the election model handles a candidate swap

What’s different about kamala_mode? Actually, not much.

Jul 31, 2024

The Silver Bulletin presidential re-election forecast relaunched yesterday, but with a new candidate: Vice President Kamala Harris, not Joe Biden, is now the presumptive Democratic nominee. And fortunately for Democrats, she gives her party better odds: a 43 percent chance of winning the Electoral College — that’s already up from 38 percent yesterday as result of some strong swing state polling for Harris. (Biden was at 27 percent in our last model run, and that was probably generous.) Now, Silver Bulletin readers don’t need any reminder that 43 percent is below 50 percent: Trump is still the modest favorite. But this looks like another close race. I’m not even sure I’d say anymore that it’s Trump’s election to lose.

Since the rest of today’s newsletter is mostly a fairly dry methodology post, let’s start with a couple of programming notes.

First, just as the “June” paid subscriber Q&A post actually ran on July 1, the “July” version will run tomorrow or Friday — which is to say, in August. In a month when one of the presidential candidates was nearly assassinated and the other one dropped out of the race, I think we can reasonably claim there were some extenuating circumstances in the schedule!

I’m going to focus this month on questions other than those involving the election model since we’ve talked about it so much. Sports questions? Poker or gambling questions? Book questions? “Meta” questions about the newsletter business? All of that’s fair game. So are other politics questions — I just want to avoid stuff that’s explicitly about the polls, the model or the horse race. There’s still time for paid subscribers to submit a question here.

And second, I wanted to drop a link to the Rick Rubin podcast, Tetragrammaton, which published today. I’m ramping up the amount of media I’m doing for the book, but I’m not going to link to everything because it would just be too much. (I’m already getting tired of hearing myself talk.) This, however, warrants an exception. Rick is a great interviewer, and we covered a lot of ground that I doubt we’ll hit in other interviews. I’d classify it as stuff that’s very much on the “art” side of risk-taking, such as the physical act of playing poker. It’s on the longer side, but I think it’s a good listen if you have a long drive or run ahead.

Now back to your regularly-scheduled newsletter.

This post will cover more about how the model has changed with Harris replacing Biden. I’m phrasing that carefully — how the model has changed, not how I’ve changed the model — because most of the changes just occur organically as a result of new polls and other new inputs. (For instance, the home state of the Democratic candidate is now California, not Delaware.)

Saying the model has entered kamala_mode is mostly a joke, in other words. There is really one thing that requires special handling — a one-off adjustment for Harris’s late entry into the race that won’t be permanently encoded into the model.

Namely, it has to do with the cutoff date for using polls. Usually, the model uses polling going back a year before the election — meaning, dating back to November 5, 2023 in this case. Why go back so far, you might ask? Well, it’s not like a poll from last December will have any impact on the model’s impression of the current state of the race. The way the model works is basically by plotting a series of several trendline curves — some more conservative, some more aggressive. (The procedure is similar to, but not exactly the same as, loess regression.) Curves can bend, sometimes sharply, which is why the model can sometimes decide that there’s been a nonlinear inflection point in the race — there was one toward Trump after the debate, for instance, and there are the makings of one toward Harris now.

However, the older data is still useful for calibrating the various adjustments the model makes: for instance, the house effects associated with certain polling firms (Rasmussen Reports nearly always has strong numbers for Trump, for instance) or which candidate tends to gain or lose ground in polls of likely voters. It can also be helpful for states that don’t get polled much.1

Until the presidential debate on June 27, pollsters occasionally asked voters about a hypothetical Harris-Trump matchup, but these polls were few and far between. There immediately started to be a lot of Harris-Trump polling after the debate, however, because pollsters — like prediction markets — correctly deduced there was a real chance that Biden would have to drop out of the race.

So that’s what I’m doing: going by the “wisdom of the crowds” of pollsters. In kamala_mode, only polls that began their interviews on June 28 or later are used. You could argue for retaining the few polls that were conducted before then, or for setting the cutoff date later, like when Biden dropped out on July 21. But there’s now enough very recent polling that I doubt it would make much difference.2

There’s one other pre-existing feature of the model that makes all of this go down easier. The model treats certain landmark political events — specifically, debates, conventions, vice presidential nominations, candidates clinching their party nominations, or candidates entering or exiting the race3 — as representing discontinuous leaps ahead in “political time”. A day with a debate is treated as equivalent to the passage of a week for instance, because there’s typically about as much movement in the polls following a debate as there is in a full week of regular campaigning. There were four such landmarks recently: JD Vance being named as Trump’s VP, the Republican National Convention, Biden dropping out, and Harris becoming the presumptive Democratic nominee. And there will be a fifth when Harris names her running mate. Thus, the model is more aggressive than usual about using only the most recent data, which is why you’ve already seen a fairly sharp swing toward Harris. Following these landmarks, the model treats polling changes as more likely to be signal rather than noise.

How the uncertainty index handles the Trump-Harris race

One last very minor change in kamala_mode: one of the nine factors used in the model’s uncertainty index — which governs how much the model expects the polls to change before Election Day4 — is simply how much polling there is. More data means less sampling error and less uncertainty, other things being equal. However, this calculation is based only on how much recent polling there is, not how much polling there has been throughout the year. Usually, this doesn’t matter, because the amount of recent polling is highly correlated with the overall amount of polling — but in the case of the Harris-Trump matchup, it does. So I’ve built in a fudge factor, where this component of the uncertainty index is temporarily higher. This adjustment will gradually phase out by September 12, three weeks after the Democratic National Convention, at which point the race will finally enter some sort of a steady state following all of these landmark events.

Several other factors in the uncertainty index are also affected by the candidate swap — but this occurs automatically, not because of any changes to the model’s programming. Let’s go through them one by one:

Factor 1 is the number of undecided voters. This is about the same as before.
Factor 2 is the number of undecided plus third party voters. This is now slightly lower — meaning less uncertainty — since RFK Jr. has declined in the polls since Harris’s entry.
Factor 3 is political polarization — more polarization means a more stable race, and therefore less uncertainty. This factor is set at the start of the year based on Congressional voting patterns and isn’t affected by the identity of the candidates.
Factor 4 is the amount of variation in national polling — that is, how much the polls have swung around so far. The Trump-Biden race had been extremely steady, at least up until the debate, so this factor had taken on a very low value before. Now it’s higher: the Harris-Trump polls have already bounced around a fair bit.
Factor 5 is economic uncertainty, which is basically based on how much the model expects the economy to change between now and the election based on a Federal Reserve survey of economists. This isn’t affected by the identity of the candidates and is unchanged.
Factor 6 is uncertainty based on news events, as determined from the number of New York Times full-width banner headlines. This is now higher, because the crazy political news lately has produced several such headlines.
Factor 7 the one I mentioned above, the overall volume of polling — it is temporarily higher because of the kamala_mode adjustment.
Factor 8 is the gap between polls and fundamentals. This is now quite low — indeed, lower than before — because both the polls and the fundamentals project a roughly tied national popular vote. There was more of a gap before, with Biden underperforming the fundamentals.
Finally, Factor 9 — newly added this year — is the number of repeat candidates. Races with incumbents or challengers who also ran in the previous election tend to be less volatile because the candidates are closer to being known commodities. This is now higher, because there’s one repeat candidate (Trump) instead of two; indeed, Harris’s image has already undergone a big shift.

Thus, four factors point toward more uncertainty than before, two factors point toward less, and the other three are essentially unchanged. So the error bars in the forecast are somewhat wider than before.

This is especially true in states that haven’t yet received much polling of the Trump-Harris matchup. The uncertainty index governs how much the model expects the national race to swing. But there’s an additional calculation in how accurately the model thinks it can pinpoint the state-by-state numbers. And that’s now higher too in most states.

Just to be clear, we’re still in a political environment with high polarization and relatively stable, predictable polling. The uncertainty index is higher than before, but still somewhat below-average by historical standards. One of the candidates winning by a double-digit margin is highly unlikely, for instance. But the race is now less predictable than under Biden-Trump.

Other small changes that aren’t a direct result of kamala_mode — and one thing we didn’t change

For the sake of meticulousness, let me document a few other things we looked at while the model was off:

As discussed here, we’ve now reduced the home-state bonus given to presidential and vice presidential candidates. Unlike the other changes, this one is permanent, not a result of kamala_mode. It’s something that we probably should have changed sooner, but we hadn’t bothered to look at it since none of the presidential or VP candidates in 2020 or 2024 were from swing states. However, now it’s likely that Harris will pick Josh Shapiro or another candidate from a swing state as her running mate.
Although I’ve criticized 538’s new election model, they maintain a terrific polling database that’s a tremendous public service. And since I was involved in the design of that database, obviously I’m still using it as a starting point — there’s no point in duplicating the work when I’d just wind up doing the same thing. However, there are some polls that meet our standards that don’t meet theirs, and vice versa. Thus, we maintain a supplementary polling database of polls that we use that they don’t5, as well as a strike list of polls we delete from their data. We’re now removing one pollster that 538 uses6 because it violates a longstanding rule from the Nate days of FiveThirtyEight of not using amateur/DIY polling. We have nothing against the guy running these polls, but we’re not looking to re-examine this procedure and we should have caught this earlier.
Lastly, there’s one thing we thought about changing, but didn’t. We have placeholder dates for one more presidential debate and one more VP debate — the presence of future debates slightly increases the amount of uncertainty in the model because debates are landmark events that can cause polling swings. Trump has waffled on whether he’ll debate Harris. However, Polymarket traders are confident the debate will eventually occur, and it’s plausible there will even be more than one debate. So we aren’t changing this yet, but will continue to monitor the situation.

For instance, an old poll showing Trump down 13 points in New Jersey at a time when national polls were tied could still be useful in imputing the current state of the race — if the national race has shifted from R +0 to R +4 (as it had before Biden dropped out), the model could infer that Trump was now down by 9 points instead of 13 in New Jersey, for example.

Although it would be slightly weird for the model’s trendline curves to encounter a situation where there was very little data (Harris-Trump polling before June 27) and then suddenly there was a lot (after June 27). In the training data, the “flow” of polling is much more linear, with the number of polls gradually increasing over the course of the year.

In particular, Ross Perot entering, exiting and reentering the 1992 race, which is included in the training data.

And to a lesser extent, how accurate it expects the polls to be on Election Day.

Or polls that Eli and I are faster to catch than 538 — being a small, nimble operation, we’re often pretty quick on the draw.

SoCal Research.

Discussion about this post

Ready for more?