SBSQ #12: Will the polls lowball Trump again?
Plus, the mistakes I made in the “data journalism” era.
Cheers from London, where I’m wildly jealous of how much better the Indian food is than in the States but still terrified I'll look the wrong way and get run over every time I cross the street. And welcome to the August edition of Silver Bulletin Subscriber Questions. Now that we’re back on track after some schedule weirdness, you can leave questions for the September edition in the comments below.
Starting this upcoming week through Election Day on Nov. 5, we’ll publish two paid Model Talk columns per week instead of one. We still hope to have plenty of free content, too, averaging 4-5 pieces per week total, as well as model updates 6 to 7 times per week.
Originally, we'd planned to turn monthly subscriptions off at this point, which have higher churn and make longer-term planning more difficult. Instead, however, there will be a price increase on newly-initiated monthly subs beginning tomorrow (Sept. 1). This will not affect people who are already subscribed on either the monthly or annual plans: our policy at Silver Bulletin is that you get to keep the price you signed up for indefinitely so long as you remained continually subscribed. It will also not affect annual pricing, which will remain unchanged at $95/year.
There is zero subtlety here: we're trying to incentivize annual subscriptions by offering them at a considerable discount. With that said, we understand there are people here for the election forecast and not much else. We sincerely appreciate your business whether you’re in that category, are around for the long haul, or even just enjoy the frequent free posts. Even for paid posts like this one, we work hard to provide some meaningful content before the paywall line so that we’re not clogging your inbox. We have email open rates that average around 60 percent, which is high for this business, and that’s a metric we look at carefully to ensure we aren’t wasting your time.
In this edition of SBSQ, let’s keep it to three questions since the final two answers are quite long. (Perhaps we can return to a more lighting-round format for September.) I’ll probably also steal a couple of questions that didn’t quite make the cut for future Model Talk columns.
What keeps you up at night about the model — and what lessons might there be for 2028?
Will polls be biased against Trump again?
Was “data journalism” a failure? What went wrong at FiveThirtyEight @ Disney?
What keeps you up at night about the model — and what lessons might there be for 2028?
Let’s start with a question from Jack Mumma in the subscriber Chat:
What does this 'unusual ' election give you a chance to learn about the model?
In some ways, the unusual events of this election — the most important of which, of course, was Joe Biden dropping out midstream — make it hard to learn long-term lessons since the circumstances may not be replicated. However, every election presents new challenges. Four years ago, we inserted various special provisions for COVID, such as reducing the convention bounce adjustment (figuring that mostly virtual conventions wouldn’t really “pop”) and widening the error bars in the forecast, given that turnout would potentially be harder to predict in a pandemic. I suppose I feel OK about those changes — the convention bounces were indeed small, and the polls were quite far off the mark. But it’s hard to know: we’ll never get an adequate sample size for what was (hopefully) a once-in-a-century pandemic.
This year, by contrast, we’ve tried to stick to the model’s preexisting logic as much as possible, even though the way events have piled on top of one another — such as RFK Jr. dropping out the day after the DNC — is not ideal. One reason the model has moved so much on new Pennsylvania polling, for instance, is because since earlier polling occurred before what the model considers to be two “landmark events” — the convention and a candidate exiting the race — so the model looks at older polling as being really out of date. Might it be overdoing things a bit? Sure. But I don’t think there’s anything inherently wrong with how it’s handling these circumstances, either, and it’s also naturally self-correcting (e.g. if/when Kamala Harris gets some better Pennsylvania polls, her numbers will improve). The model has closely followed prediction markets throughout these periods, which seems like a good sign.
The convention bounce adjustment is one of those things that keeps me up at night, though. It can make the model's behavior counterintuitive, e.g., Harris gaining in the polling average while losing ground in the forecast. Given that convention bounces are smaller than they used to be, maybe we could treat them as a rounding error in future years, like by saying that Harris’s numbers were likely to be a little inflated for a few weeks in the narrative explanations we provide, but not making any specific adjustment for the convention in the model itself.
But our goal here is not to flatter our (probably majority Harris-voting) audience but to give you the best forecast we can. Trump and Harris both did get convention bounces, according to our polling averages, although there were complicating circumstances in each case. (The assassination attempt immediately preceding the RNC, perhaps enhancing Trump’s convention bounce, and RFK dropping out, perhaps mitigating Harris’s). And I don’t see anything inherently wrong with the model’s hypothesis that this ought to be a strong period for Harris in the polling, so if the Electoral College is a dogfight even now, that’s not a great sign for her. The theme is “trust the process” — but of course, we’ll want to see how the rest of the process will play out.
Will polls be biased against Trump again?
Jeremy asks:
Question for SBSQ12: Which direction do you think polling errors will go. I've heard the simple arguments. Trump is a unique candidate, and he outperformed in 2016 and 2020, and therefore is more likely to do so again. And the counterarguments, polling errors are hard to predict, equally likely to affect either party, pollsters updated methods after 2020, and had a great year in 2022 (and 2018).
I know this is a tough question, and it's hard to go deeper, but assuming the model stays in this 50/50 range (a very likely scenario), this is what will decide the election. Could you do a bayesian analysis, given what we saw with polls in 2016/2020, and in 2022?
There were many versions of this question, so I picked Jeremy’s out of a hat. (Congrats, Jeremy.) Let me start by articulating precisely what assumption is embedded in the model: The model assumes that, by Election Day, polls won't be predictably biased.
Those details are important. It means that we think the polls, over the long run, will aim toward the center of the target. But it does not imply that they will have an especially precise shot. Rather, we think the situation looks roughly like the bottom-left corner of this chart. In any given election, the polls could be off by several points in any direction. A polling error of a few points is normal, in fact — it’s years like 2008 when polls nail the outcome with incredible precision that are unusual.
And there's some chance the polls could be off by more than a few points. Our estimate of Election Day polling error is based on an analysis of all election cycles since 1936. Landline telephone penetration only reached 75 percent of the US population by the mid-1950s, and before then (think “Dewey defeats Truman”), polling was often a very rough enterprise. So the model is trained on plenty of examples where the polls were far off the mark.
Moreover, the model assumes that to the extent there’s a polling error, it probably will be systematic, at least in part. If Trump overperforms our final forecast in Wisconsin, for instance, there’s a strong chance he’ll also do so in Michigan. But the same is true for Harris.
The clause “by Election Day” is also important. Until one day before the election, the Silver Bulletin forecast is not a pure polls-based model. Instead, it combines the polls with a “fundamentals” prior based on incumbency and our economic index. Currently, this prior estimates that the national popular vote “should” be roughly a tie — not Harris ahead by 3 or 4 points, as in recent national polls. Although the fundamentals are gradually getting phased out, they still get about 20 percent of the weight for now, which is shaving a net of about four-tenths of a point off of Harris’s projected finish on Election Day.
Furthermore, the model’s convention bounce adjustment explicitly assumes that the numbers for the candidate who just held their convention are likely to be inflated for a few weeks. That’s having a reasonably big impact on Harris’s forecast right now.
The upshot is that for the time being, the model thinks Trump is more likely than not to do better than his current polling. That does not mean it assumes the polls would be predictably biased if you held an election today — but the election isn’t today. It thinks Harris is more likely to face headwinds than tailwinds from here forward.
The model's interactions between state and national polls are also complicated. As I wrote, the model defaults toward a “polls-only” view of the race by Election Day. Or at least, that’s mostly true: economic data and the fundamentals no longer have any influence, and the convention bounce adjustment will long have worked its way out of the system. But the model somewhat liberally interprets this “polls-only” mandate: it uses state polls to inform its estimates of the national popular vote and national polls to inform its state-by-state polling averages. And it uses information based on demographic data and past voting patterns to smooth out the state-by-state estimates and come up with plausible maps.
In fact, our forecast of the national popular vote is not actually based on national polls. Rather, we project the results in each state, and then sum up these estimates, weighted by the projected turnout in each state, to come up with a national number. National polls have a substantial impact on the forecast — but it’s indirect based on the various adjustments the model makes, like its trendline adjustment to bring older polling data up to date.
Furthermore, our state-by-state forecasts aren’t purely based on state polling, but instead a combination of state polls and inferences the model makes from polling in other states and national polls.
Let’s say, for example, that the only poll we have of Hawaii shows Harris ahead by just 5 points. The model will blend that with an estimate based on a regression analysis using polling from other states and national polls, informed by demographic data and past voting patterns. If everything else looks normal, for instance, the model will be skeptical that Harris is really going to win Hawaii by just 5; maybe its regression estimate will say she should win by 25 points or something instead. So it will compromise, especially if the lone Hawaii poll is out of date or comes from a pollster with a low rating. In fact, it might still forecast Harris to win Hawaii by 15 or 20 points. Put differently, the model will assume that this particular Hawaii poll is biased — very biased — against Harris. That is to say, statistically biased: we assume nothing about the intent or integrity of the pollster.
In more robustly polled states, the state polling averages get the majority of the weight, and the regression estimate gets less. Indeed, often the overwhelming majority — by Election Day, anywhere from 80 percent of the estimate in fringy swing states like New Hampshire to the high 90s in others like Pennsylvania will come directly from our state polling averages. But the model does hedge if the polls in one state seem out of line with others and there isn’t much polling. This is actually one subtle reason the model has gotten a little bit bearish on Harris lately. Since there hasn’t been a ton of great state polling since the DNC and RFK’s exit, it defaults somewhat toward its regression-based estimates, which reflect the relatively large gap between the Electoral College and the popular vote that hurt Democrats in 2016 and 2020.
Phew. I won’t pretend the process is simple; it’s several thousand lines of code. I still haven’t really answered your question, though, Jeremy. Let’s say that by Election Day, the model’s estimate of the national popular vote is Harris +3. Should you assume that estimate is biased — that the “real number” should be Harris +2 or something — given that Trump overperformed in 2016 and 2020?
My short answer is “no”. Indeed, if I thought polling was predictably biased, I’d have built that into the model somehow.
When it comes to polling, I’m mostly a “macro” guy. I believe in looking at the topline numbers and making reasonable adjustments for things like house effects, but not driving myself crazy by delving into the crosstabs. And I think the macro case for assuming that you can't predict polling bias is strong.
First, there’s historically been no relationship in polling error from one election to the next. Polls were biased against Democrats in 2012 but then toward them in 2014 and 2016, for instance:
Second, pollsters have strong incentives to self-correct and get the answer right. They’re aware of all the criticism they took in 2016 and 2020. You might liken this to something like the efficient markets hypothesis: sure, the market might be wrong, but market participants are smart, and it’s hard to outguess them.
Third, people are probably making too much out of a sample size of n=2. The polls were biased against Trump in 2016 and 2020. But in the presidential election just before that, 2012, they were biased toward Mitt Romney and other Republicans. And the polls were basically unbiased in the midterm years of 2018 and 2022.
Fourth, conventional wisdom often guesses wrong about the direction the bias will run. In 2022, there was lots of talk about a “red wave” that was never really justified by the polling. It’s a myth that there was a systemic polling error in 2022 — which actually was among the most accurate years for polling on record — but Democrats did outperform the conventional wisdom and slightly overperformed the polling averages in a series of key Senate races.
If I were more of a micro, in-the-weeds guy, I might feel differently. Pollsters offer different excuses for their problems in 2016 and 2020. In 2016, the issue in some cases was drawing overly college-educated samples since college-educated voters are both more likely to respond to polls and increasingly more likely to vote for Democrats. In 2020, COVID may have been an issue. Democrats were more likely to “socially distance” at home and more likely to respond to polls given that there wasn’t a whole hell of a lot else to do while Republicans were out at the local Applebee’s and ignoring their phones like nothing had really changed.
Are the excuses pollsters made for 2016 and 2020 valid ones? I don’t know. Some groups of voters are far more likely to respond to polls than others, and this is undoubtedly correlated in various ways with their voting intention. You can partially solve issues like these with demographic weighting — making sure that the fraction of college-educated respondents in your poll matches that from Census Bureau data, for instance, or that you have the right number of respondents from different regions within a state. However, there could still be various implicit biases. Even among non-college-educated respondents, for instance, you might have a group with higher social trust, and people with higher social trust tend both to respond to polls more often and to vote for Democrats.
Some pollsters go further by weighting results based on party identification, party registration, or how people say they voted in the past. These techniques are probably pretty effective at avoiding partisan non-response bias, though they may run the risk of missing real movement in the polls. Party identification can be fluid: a voter who becomes disillusioned with Trump may also begin telling pollsters that he's now an independent rather than a Republican, even if he hasn’t yet changed his party registration.
There are some signs that seem favorable this year. There isn’t a big split among pollsters based on their methodology, for instance — if phone pollsters said one thing, and online pollsters said another, that might be more of a cause for concern. The amount of ticket-splitting in polls is also interesting. When Joe Biden was running, Democrats fared much better in polls of Senate races than in the presidential race in the same polls, and that’s carried over to some degree for Harris. That would seem to cut against the Biden White House’s theory that polls were skewed against Biden — the polls were finding plenty of Democrats, just not many Biden voters. Perhaps it implies you can make a case for Democratic Congressional candidates underperforming their numbers if their numbers converge toward the presidential race. But that’s less of a worry for Harris. I don’t really believe in “reverse coattails”, but maybe you could even argue that the Senate numbers imply that she has some room to grow.
Finally, while I’ve never really bought the “shy Trump voter” theory — the problem with polls in 2016 and 2020 was more that Trump voters weren’t getting captured by polls in the first place and not that they were concealing their support when pollsters did reach them — it’s worth mentioning that declaring one’s support for Trump has become more socially acceptable among certain subgroups, like Silicon Valley or crypto types or younger voters of color. And with MAGA hats and yard signs, Trump fans are often quite demonstrative in support of their candidate.
So there’s a lot to think about here — but to a strong first approximation, I think polls being unbiased is the right assumption — they could miss, and if they do miss, the miss will probably be systematic, affecting many states. But they’re about equally likely to miss in either direction.
But to put this in more Bayesian terms: What if you gave me a $100 free bet on Trump either underachieving or overachieving his final polls? I suppose I’d bet on “overachieving”. I have just the slightest lean in that direction. But if DraftKings offered such a bet at -110, meaning I had to pay a small tax on a winning bet, I wouldn’t wager either way.
Was “data journalism” a failure? What went wrong at FiveThirtyEight @ Disney?
Phil B asks:
I would like to hear more about how Nate’s ambitions for data journalism have evolved since he started 538. What did “conquering the world” mean back then? What changed? Also, with the rise of US sports betting and mainstream sports media coming to terms with that fact, what does Nate think about the prospects for a more data-oriented approach in sports journalism?
Thanks for the question, Phil. This is already a long newsletter, and I was tempted to break this response out into a separate post rather than burying it here. But having multiple threads for SBSQ wound up being confusing last time. I’m going to warn subscribers that I may adapt this response for a standalone post in the future, though — it will go on the Rainy Day List.
The early days of FiveThirtyEight @ Disney, circa 2014-2016 and originally under the auspices of ESPN, was a period I consider unsuccessful despite being presented with a very generous opportunity. I think I made a lot of mistakes and I frequently think about what went wrong. FiveThirtyEight, in my biased opinion, developed into an excellent site by ~2018 (until Disney basically let us stop re-hiring open positions by ~2021, a sign of trouble to come). But those early years were rough, and I was unhappy, so here’s an inventory of Mistakes That Were Made — or really Mistakes That I Made — should any of you find yourself in a similar position:
Signing on without any pretense of a business model. I’ve written about this before, but this was the Original Sin. It might sound great when some big business throws a bunch of money at you and says, “Just worry about the creative side, and we’ll take care of the rest”. But in a tough business like media, that’s setting you up for failure. You’ll be vulnerable to glacial forces in the broader enterprise (e.g. a bunch of theme parks closing during a global pandemic). You’ll also be dependent on remaining in the political favor of your bosses and avoiding the sort of regime turnover that typically occurs every 3-5 years in publicly traded businesses. Because it’s hard to justify running a division of a company as a loss leader when the parent is going through a downcycle. I think FiveThirtyEight @ Disney could actually have been a great cashflow-positive business (with a primary revenue stream of subscriptions). But I sold it at a time when ESPN was printing money and we looked like a rounding error the suits in Burbank and Bristol could easily afford. It was never really anyone’s job to make money for FiveThirtyEight, and if you don’t build that muscle memory right away, it’s hard to develop it later.
Wanting to be an editor-in-chief for some reason. I don’t know why. It was never really an aspiration I’d had before. Maybe it’s because I visited the Grantland offices while ESPN was recruiting me and what Bill Simmons was doing seemed kind of cool. And surely it was my inflated ego after being on a winning streak. But I had this romantic vision of presiding over a newsroom like in the 1970s-1990s peak of Magazine Era journalism. I should have focused more on where I added value — writing, building models, and perhaps occasionally playing a mentorship role — and given someone else the management/editing keys. Gradually, the amount of time I spent on management and editing dwindled from ~60 percent of my time to ~5 percent, and it made for a better newsroom. But it was awkward at many points in between as I’d dip in and out of leadership tasks and it wasn’t always clear who was in charge.
Hiring too much, too fast. This was a direct consequence of not having a business model. I realized right away — correctly, I guess — that we had fairly perverse incentives. We’d never have more political capital within Disney than when we launched. And Disney is a slow-moving, interia-driven place and it’s actually pretty hard to have your headcounts cut once you’ve grabbed them. So I wanted to establish a large footprint. But this made it very hard to iterate, develop talent, and grow organically. And there was a general sense of chaos since it was everyone’s first rodeo.
Not thinking enough about differentiation. After the OG version of FiveThirtyEight.com was something of a viral success, and then my tenure at the New York Times (2010-13), there were some things I had an empirical basis to think would work really well. Election models. Sports models. And I suppose my blog/explainer-style writing. So we should have focused more on growing those product lines out, especially the sports models given the relationship with ESPN. We did eventually find other “hits” such as the podcast, liveblogs, and the ability to build informative and beautiful data visualizations. But there were a lot of misses, especially in our attempts to cover general-interest news. Say, for instance, that there was a school shooting somewhere — how were we supposed to cover that? Our attempts were often clumsy — a perfunctory article with perhaps a nice chart or two. This shouldn’t have been in our purview to begin with, and this is the sense in which the term “data journalism” now makes me cringe. In contrast, I think some of our non-news verticals, such as science, were a more natural fit for the site, and we could probably also have covered some more Riverian beats like tech and finance.
Focusing on quantity over quality at launch. The clearest manifestation of this problem was spreading ourselves very thin at launch after I’d done a lot of bragging and puffed my chest out in the media. Just a huge and foreseeable mistake on my part. Because when you open up your doors and invite all the critics in, every dish on the menu will be scrutinized. We should have launched (in spring 2014) with our NCAA tournament model, and maybe some other models like baseball and an Obama approval rating dashboard, but had a higher threshold for written content, dressing to impress with a great freelance story or two that we’d commissioned well in advance. And then we should have launched the midterm model sooner instead of waiting until mid-September (?!?). I was too focused on “proof of concept” for data journalism instead of the core business.
Not adjusting to being perceived as the powerful incumbent and not the scrappy underdog. The original version of FiveThirtyEight, when it began as an anonymous blog, was sort of an underdog story. And to some extent, that even carried over into the New York Times days, in part because I was a little crosswise to the rest of the Times’s mostly vibes-based election coverage. But the minute you sign a multi-year deal with one of the World’s Largest Media Conglomerates and do a bunch of bragging about how you’re going to revolutionize journalism, you’re putting a huge and deserved target on your back. Plus, I was due for some mean reversion anyway.1
Not anticipating the vibe shift. After Obama’s win in the 2012 election, a neo-liberal, technocratic, data-driven era of journalism and politics felt ascendant — one which suited both my politics and my skill set. Instead, 2014-16 was the beginning of a decline phase for technocratic liberalism, marked by the increasing influence on social justice politics (”wokeness”) on the left, and Trumpian populism on the right. Of these, Trump was a bigger problem editorially — I was far too dismissive of Trump’s chances in the 2015/16 GOP primary, for instance. But the former was a bigger issue from a business standpoint since both the staff and the audience were left-leaning — frankly, more left-leaning than I was. The “stick to sports” controversy at ESPN only added to the fun. Political “vibe shifts” are often hard to predict, but I would have done many things differently if I had foreseen this.
I hope you enjoyed this very long newsletter, and don’t forget to leave questions for next time in the comments below.
E.g. after going 50 for 50 in “calling” in states in the 2012 election.
Nate, I am once again asking to stop basing your model's accuracy on prediction markets. Reminder that Polymarket had Beyonce at a 96% chance to play at the DNC! They don't know any more than any of us here do. It's silly to assume that they should always mirror your model.
I think you made the right call being editor-in-chief.
Had you not done it, you would have always regretted not taking the shot for your website, and you eventually got to the percentage and work you felt better on.
And as EIC, you had Claire, Micah, Harry, Ben and Walt. That combo was incredible and all those folks may not have come if you weren’t EIC.
And if you hadn’t done it, transiting to your current role would be tougher. You might have taken traditional media deal thinking, “I want to run a large staff,” then you’d be miserable for another five years instead of so free to do what you want now.