There’s more herding in swing state polls than at a sheep farm in the Scottish Highlands
Some pollsters aren’t telling you what their data really says.
It’s obviously a really close race. But for some pollsters, it’s a little too close.
Take, for example, this afternoon’s polling release from the British firm Redfield & Wilton. They polled all seven of the core battleground states. And in all seven, Kamala Harris and Donald Trump each received between 47 and 48 percent of the vote:
Isn’t this a little convenient? Whatever happens, Redfield & Wilton — not a firm with a well-established reputation in the US — will be able to throw up their hands and say “well, we projected a tie, so don’t blame us!”. And since all of these states are also close in the polling averages, they’ll also ensure that they won’t rank at the bottom of the table of the most and least accurate pollsters — although unless the race really is that close, and it probably won’t be, they also won’t rank toward the top.
Now granted, our forecast is close too. But it’s based on polling averages: dozens of polls have been released in each of these states over the past month. That greatly increases the sample size. Collectively, they’ve surveyed about 230,000 voters.
By contrast, the median sample size in individual polls in these states is 800 voters. In a 49-49 race in a poll of 800 people — assuming 2 percent goes to third parties — the theoretical margin of error for the difference between Trump and Harris is ±6 points. If that sounds higher than you’re expecting, that’s because the margin of error that’s usually reported in polls is only for one candidate’s vote share. For instance, in a poll of 800 people, Trump’s margin of error is about ±3 points, as is Harris’s. However, basically every vote that isn’t a vote for Trump is a vote for Harris. If Trump gets 52 percent of the vote instead of 49, that implies Harris will receive 46 percent.1 So the margin of error on the difference separating Trump and Harris is ±6.
What this means is that if pollsters are doing honest work, we should see a lot more “outliers” than we do — even if people love to complain about them on Twitter.
In our database as of this afternoon’s model run, there were 249 polls in the seven battleground states that met Silver Bulletin standards and did at least some of their fieldwork in October.2 How many of them showed the race in either direction within 2.5 percentage points3, close enough that you could basically call it a tie?
Well, 193 of them did, or 78 percent. That’s way more than you should get in theory — even if the candidates are actually exactly tied in all seven states, which they almost certainly aren’t.
There’s more detail on this in the table below. Using this margin of error formula, I calculated the likelihood that a poll should show the race within ±2.5 points. This depends greatly on the sample size. For a poll of 400 people, the smallest sample size in our October swing state database, the chances that it will hit this close to the mark are only about 40 percent. For the largest sample, 5686 voters, it’s almost 95 percent instead. But most state polls are toward the lower end of this range, surveying between 600 and 1200 voters. All told, we’d expect 55 percent of the polls to show a result within 2.5 points in a tied race. Instead, almost 80 percent of them did. How unlikely is that?
Based on a binomial distribution — which assumes that all polls are independent of one another, which theoretically they should be — it’s realllllllllllllly unlikely. Specifically, the odds are 1 in 9.5 trillion against at least this many polls showing such a close margin.
The problems are most acute in Wisconsin, where there have been major polling errors in the past and pollsters seem terrified of going out on a limb. There, 33 of 36 polls — more than 90 percent — have had the race within 2.5 points. In theory, there’s just a 1 in 2.8 million chance that so many polls would show the Badger State so close.
In Pennsylvania, which is the most likely tipping-point state — so weighing in there is tantamount to weighing in on the Electoral College — the problems are nearly as bad. There, 42 of 47 polls show the Trump-Harris margin within 2.5 points — about a 300,000 to 1 “coincidence”. Arizona, Georgia and Michigan are less bad — though that’s partly because polls there have usually been showing leads for Trump, Trump and Harris respectively, so if pollsters are trying to match the consensus, a stray Trump +3 in Arizona or Harris +3 in Michigan won’t stand out so much.
This is a clear-as-day example of what we call herding: the tendency of some polling firms to move with the flock by file-drawering (not publishing) results that don’t match the consensus or torturing their turnout models until they do. Some pollsters, like the New York Times/Siena College, don’t do this, and are proud to own their work even when it differs from the polling averages. But others like Redfield & Wilton do. Here are the numbers for the polling firms that released at least eight swing state polls in October:
Redfield & Wilton are by far the worst herders: 42 of their 44 polls show the margin within 2.5 points or 95 percent. There’s only a 1 in 175 million probability of that occurring randomly, according to the binomial distribution. Emerson College and InsiderAdvnatage are also on watch for having had, respectively, all 12 and all 10 of their October swing state polls within that 2.5-point threshold.
By contrast, the most highly-rated polling firms like the Washington Post show much less evidence of herding. YouGov has actually had fewer close polls than you’d expect, although that’s partly because they’ve tended to be one of Harris’s best pollsters, so their surveys often gravitate toward numbers like Harris +3 rather than showing a tie.
Our pollster ratings actually include a penalty for herding: polls that consistently match numbers from other recently published polls in a way that is highly statistically improbable see their ratings downgraded. That serves as at least a little bit of a counter to the perverse incentives that polling aggregators like RCP, 538 and Silver Bulletin admittedly create by publishing polling averages, which both give pollsters a target to herd toward and make supposed outliers really stand out.
The irony, as I wrote in 2014, is that although herding may make individual polls more accurate, they actually make polling averages less accurate. Polling averages are supposed to aggregate independent opinions — that’s literally one of the preconditions for the wisdom of crowds working in James Surowiecki’s classic book by that name. And from pollsters like Redfield & Wilton, we aren’t getting an independent opinion — in fact, we aren’t really getting an opinion at all.
In this election, the incentives are doubly bad, because the polling averages in the swing states are close to zero — so a pollster can both herd toward the consensus and avoid taking a stand that there’s a ~50/50 chance they’ll later be criticized for by publishing a steady stream of Harris +1s, Trump +1s and ties. Lately, a lot of national polls have also shown near-ties after usually showing Harris leads earlier in the race. We wonder if there’s been an increasing amount of herding there too, perhaps involving the use and abuse of likely voter models4 — national polls have tightened and moved toward Trump considerably more than state polls have become Trumpier over the past month, except in Nevada and Florida:
By contrast, New Hampshire — a state that has received less attention and where Harris is expected to win — has shown none of this herding. Just in the past week, there’s been everything from a Harris +21 to a nominal Trump lead:
All of this herding — and hedging — increases my concern about another systematic polling error. It might be an error in Trump’s favor again, but it won’t necessarily be: pollsters may be terrified of showing Harris leads after two cycles of missing low on Trump, and they probably won’t be criticized too much for a Harris +1 or even a Trump +1 if she wins in Michigan by, say, 3 or 4 points.
Or there could be errors that run in both directions. Crosstabs show sharp moves away from Democrats among Black and Hispanic voters, and to some extent corresponding gains among white ones. If those crosstabs are real, you’d expect to see some bigger shifts on the map — Georgia being a really rough state for Harris, for instance.
But with notable exceptions, you don’t see that. The exceptions, however, are from some of the best pollsters in the business, like NYT/Siena, which has given Harris some of her best numbers in Pennsylvania but some of her worst in Arizona and Georgia, consistent with a scrambled map. And from Ann Selzer, who has consistently published seeming “outlier” polls only later to be proven right — she had Harris down only 4 points in Iowa. Polls in Kansas and the 2nd Congressional District of Nebraska — where herding is less likely because these races aren’t expected to be close and they don’t get much attention — have also shown conspicuously strong Harris data. If Harris approaches the numbers the polls show in these places, she’ll probably win demographically similar states like Michigan and Wisconsin comfortably.
Whatever happens on Tuesday, it would be a surprise if there were no surprises. In only 1011 out of our 40,000 simulations on Friday afternoon, or about 2.5 percent, did all seven battleground states actually finish within 2.5 points. The Redfield & Wiltons of the world may think they’re playing it safe — but they’ll probably be wrong.
Retaining the 2 percent for third parties.
As it’s the first of the month, there are no polls yet with November field dates.
I’m using 2.5 points rather than an integer because some polls release results both with and without third-party candidates, in which case we just average the different versions together. So it’s fairly common to see a poll listed as showing, say, a 1.5-point margin if the version with third-party candidates shows Harris up 2, and the version without them has her up 1.
All the data in this article is based on likely voter polls, unless a pollster only published a registered voter version.
As we approach the election, the Silver Bulletin comments section seems more and more Trumpy. Any ideas why? I liked it better around here where the comments were jokes and data nerd talk.
Pro-herd, anti-herding.
If there's an anti-Harris bias resulting from the herding, I really hope we don't get an ugly November 6th, with Trump supporters screaming that there must be fraud because Harris won Michigan by 8 points or something.