Folks who spend time doing deep dives on and critiquing polls and models online for purported bias against their candidate might consider putting that time much more usefully toward volunteering for their candidate instead … just a thought
Or they should be doing deep dives and critiquing models of the roundness of the earth, or the effectiveness of vaccines. All this stuff is equally fun, and you can get pilled on lots of things!
And folks who spend time doing critiques of people doing deep dives of polls might consider putting their time elsewhere other than on sites dedicated to doing just that.
Some of us are here because we are interested in the material, and we might also do volunteer political work, as one does not preclude the other. I also go for walks in the park, as I dont do volunteer campaigning all my waking hours.
Eh, the shape of the critique matters. If someone is here postulating that we're gonna find out in about 5 weeks that "this" or "that" element was probably wrong, that seems like a good fit for this material.
If someone is here arguing that this model should change *now*, in order to match that person's idea of accuracy, because they think they know this or that thing that is wrong right now, rather than guess that we might find out that it is wrong, then that sounds like someone who isn't really equipped to understand what we're looking at here.
I have a feeling that OP is talking about the latter, as they seem to far outnumber the former. Perhaps that's an intuitive misapprehension - it may be that those people just get more engagement on their comments than the other, but genuinely it does feel like there are more people in here trying to re-model the model, rather than genuinely critique it.
If we are talking about polls, then you need to consider bias, and since this is a discussion group why not discuss bias.
I work for Harris, but as the polls had a deep under estimation of Trump in 2016 and 2020, failure to try to understand that might result in another under estimation as the demographic/political dynamics haven't changed a lot, where as bringing in 2012 is almost an ancient alignment.
I think there are studies showing that Trump supporters are less likely to answer polls, which could explain some of that discrepancy. Dont forget that 538 polls results had 8 total states wrong in the 2016 and 2020 presidentials, and granted the error was not always large except for maybe WI, all 8 states were called Blue and went Red, where eg RCP had 5 errors and 4 of 5 were blue calls going red and one the reverse.
There are plenty of those that deep dive and are perfectly capable of criticizing a poll that is good for "their" candidate. Especially when they come from a professional background. At the end of the day if that's your career you're graded on how good you are and not whether you made your candidate feel good
Which is where I am at. I support Harris, knock on doors, but I am wary that whatever dynamics caused the 2016 and 2020 Trump under estimations will be in play as the voter dynamics haven't shifted, and I dont see that the polling has shifted greatly.
Is it correct to assume that Trafalgar and Rasmussen (two of the most biased per this article) earn their relatively high scores (B+ and B respectively) because they are *consistently* and predictably biased, and therefore reasonably accurate once the "windage" is taken into account?
No. The ratings aren't based on adjusted results. The reason they are rated highly is because for the last two elections they've been right. The actual result was better for Republicans in both 2016 and 2020 than the average of the polls. A pro-Republican house effect is, therefore, what we would expect to see from an accurate polling company.
PS I should clarify, that's true for their historic house effect, not necessarily their current house effect. Pollsters that were more Republican in 2016 and 2020 will have been more accurate. That doesn't mean being more Republican this year will also make you more accurate - Nate's model assumes the average polling error is equally likely to be in either direction, since pollsters try to correct for past errors.
I asked three friends at lunch who they’re voting for, and we’re 3:1 for Harris. Please include this data point in the model. It’s a cinch to count to 4, so there’s zero uncertainty in my numbers.
Margin for error (sometimes called “uncertainty”) doesn’t mean that we are uncertain of what the people think - it’s our uncertainty about how well the average of our sample approximates the average of the population at large. Sampling from one person’s friends causes very non-representative results, and sampling from people who answer the phone the first time you call causes moderately non-representative results. Even if you are really doing totally random samples, smaller samples are more often unrepresentative than large ones. Reported margin-of-error is usually just based on sample size, and assumes true randomness of the sample, but the pollster weightings and house effects here try to make up for the rest.
Margins of error in polling only capture the statistical error, i.e. how close the poll numbers are to the actual distribution given our sampling method. It doesn't say anything about how well the sampling method captures the real distribution in the population and especially real distribution among actual voters. That's the whole reason why polls disagree.
Margin of error as quoted is valid for samples taken randomly from a production line of supposedly identical manufactured widgets. It's less meaningful with humans, most of whom are defective anyway.
It doesn’t matter if the widgets are identical. In fact, if they’re identical; the survey should either yield a result of 100% or 0%. Statistical sampling, and margin for error, really is great for sampling a diverse and heterogenous population that is full of messy individuality - but only if you can really draw a random sample from it with uniform probability of sampling each individual.
It's a legitimate point about: more polling companies = more garbage data. To be honest I don't know how to solve this. My feeling is that companies with scarce history of ground truth or completely new companies or companies that seem to have adjusted their sampling in significant ways should be removed entirely. Perhaps they can be useful for seeing relative change. But their bias should be essentially set to unknown.
If they don't have a good track record, they'll be included in the average with a low weighting. Nate only excludes them entirely if there is reason to think they aren't even trying to produce reliable polling and are engaged in actual fraud.
Here is a basic problem. Too many pundits and partisans view polls as sympathetic magic. They believe that reporting polling in the desired direction means their desired candidate is going to win: that polling is not analysis but something that causes voting.
It's a little more complicated than that. Enthusiasm is a real thing in elections, high enthusiasm drives turn out, low enthusiasm suppresses it, and published polling *is assumed* to have an effect on enthusiasm - and I imagine, under some circumstances, it does.
The size of that effect, and the circumstances that might modulate the size of that effect, and even the *direction* of the effect and circumstances that might affect that direction, are at best poorly understood, so I fully agree with you that it's a problem, and perhaps indistinguishable from "sympathetic magic". There's at least a decent reason for people to think it, but no good reason for people to assume it's actually impactful, as they do.
In particular, it would seem like a *close* polling result would drive up enthusiasm across the board, regardless of who is winning, and a not close result would suppress it, possibly asymmetrically. But, as said, poorly understood, that's just a guess.
What’s your evidence for this? In my experience, consumers are hungry for all sorts of data. Reporting a poll of someone’s preferred candidate being down draws eyeballs too. People consume hope and tragedy. Arguably, more tragedy.
When you see people complaining the polls are "rigged" and post comments to this effect with "#deepstate" that indicates that they believe that polling influences voting outcomes.
I *think* their working theory is that polling showing a candidate leading by a reasonable margin leads to lower turnout for those who would otherwise vote for the other candidate.
I think this is a load of crap personally, but that's what they'd have you believe.
I have assumed claims of polling rigging to potentially be precursors for claims that the election itself was stolen. If polls were showing Harris leading significantly when it is close to the election and people generally believed they were fair, Trump’s claims of the election being stolen from him would be harder to spread outside of his true believers. If his campaign has already convinced people that pre-election polls showing him losing were flawed, denial of the election result would be easier to sell.
I’ve heard that theory before. I haven’t seen evidence to support or refute it. But, assuming it’s true, I would think that poll watchers who favor Trump would be happy to pollsters they see as left leaning lull Harris-inclined voters into a false sense of security. But I’m seeing the opposite. As Harris gets positive poll results, I see anti-Harris poll watchers posting that the polls are impossibly flawed and shouldn’t be included in Nate’s forecasting.
Which leads me to believe that another, opposite, theory is more likely to be true—people complaining about Harris’s polling seem to believe that voters can be pursued to jump on the bandwagon of the more popular candidate. Kind of like fair weathers fans rooting for the number one ranked team in a given season. *I haven’t seen any evidence for or against this idea either.
I don’t understand how you can determine how a pollster is biased if you don’t have “ground truth”? I would think you could compare how a pollster polled just before, say the 2022, election, and compare this to the actual results. It seems that comparing to the mean is meaningless if the mean is off (as in “garbage in, garbage out“).
Start with an assumption that no polls are biased, and then generate an average. From that average, assign a bias score to every poll based on how far off the average they are. Then over time, look at newly released polls from every firm, adjust based on the bias and compare that to the average. Adjust the bias based on how far of THAT is from the average. By iterating this over multiple cycles of polls and averaging you end up with stable biased scores. Polling firms that are just noisy (miss the average in both directions) will settle out with a low bias, while polling firms that always miss the average in the same direction will get a correlated score assigned.
That gives you a reliable table of adjustments against the industry average. The other way you can score the reliability of this kind of model is examining past results. It's as simple as "If we look at every election where a candidate had a 70% chance to win, did 70% win and 30% lose?" If the answer to that is generally yes across all percentages then you can be confident that you have a well built model. If it's no, then you have to try to figure out what information you aren't accounting for.
“Start with an assumption that no polls are biased”
But maybe they are biased. For instance, I don’t see where polls correct for the fact that women vote ~10% more than men. Polls I believe always poll equal number of men and women. Since women poll ~20 pts towards Harris, won’t this have a built-in bias? This seems too obvious to have been overlooked by all pollsters; but maybe there are other sources of systematic (unintentional) bias.
That wasn’t meant to be a true assumption, in fact it would be impossible to be true since that would mean all polls are identical. The point of starting there is that you don’t know which polls are biased.
You back test against past elections (not just presidential elections, all polls) to identify which polls are reliable and weight those higher. If there was a systemic bias in polling it would show up in this kind of analysis, if it was industry wide it would show up in the models as well. People can and have tested for it.
Some people think there is a specific bias just against Trump, but that’s based on a sample size of 2, which isn’t really something to reliably take action over.
Finally, pollsters and model builders have a strong incentive to find and fix mistakes/bias, because their reputation depends on reliability. Anyone who deliberately ignored it would fall off the radar as unreliable.
“You back test against past elections (not just presidential elections, all polls) to identify which polls are reliable and weight those higher.”
I guess I haven’t seen this. But I’d like to see if they are historically biased (ie, not accurate) and if so, by how much. And do these numbers line up with the “bias” values used in the Silver model.
Here’s Nate’s report on the 2020 election where he compares expectations to results. A good model will look at these regularly as evidence of if the model needs changes or not.
Thanks. This is useful. I notice that in 2020 congressional races, the model predicted incorrectly towards the D. I’m wondering if this year the model has been over-tweaked in the opposite direction…
By "start with an assumption that..." Often shows up in Bayesian modeling. Basically, you start somewhere, run an estimation, update that assumption, run another, and so on. That's a bit over simplified, but it's called a prior distribution and a posterior distribution. Another example is that if you have a potentially biased coin, you can start from assuming it's fair, but through iterations you can modify that assumption to close in on actual values (with a degree of error).
The long term trend is that previous elections have pretty large prediction errors also.
Especially comparing mid-terms to presidential elections.
So if one is going to assume that the blended average represents the real state of the electorate, it is pretty reasonable to use this to calculate a present day statistical bias factor.
If someone doesn't believe the blended average is useful, then perhaps polling aggregation methodology just isn't their cup of tea.
That would assume that pollsters don't try to fix their own accuracy, something that's definitely not true. There might be factors beyond quality but I don't think pollsters just try to relive 2016 over and over again for example. There is very little ground truth available and it's a real problem. Pollsters adjusting their sampling in unpredictable ways perhaps contrary to their bias history or just more pollsters entering the arena can easily overwhelm a model and cause huge persistent errors.
No - it is that pollsters adjust their models and the models are still "wrong", as in the error term is near the limit for the 95% confidence number usually quoted.
The 95% confidence has nothing to do with it. 95% is for how well our gathered numbers reflect the actual distribution given our sampling method. E.g. if we do a phone poll and our definition of LV is xyz then we're 95% confident that our gathered numbers are within the margin of the actual result i.e. if we took the entire sample and asked them the same questions and calculated the results. BUT it says nothing of: how well does a definition of LV actually capture voting behavior, whether people lie, whether there is some inherent bias in data gathering process that isn't corrected for, etc. That's the whole reason why actual results often fall outside of the range.
For a large population, sample of 1000 and 95% confidence means 3%. That's just "if we sent the questionnaire to everyone and everyone answered assuming no bias in our sampling method" what would be the mathematical error just emerging from randomness. Has nothing to do with whether your chosen mode of conducting a survey is representative of the actual distribution among people who actually vote in a given year. That's why it's so difficult to for example predict the turnout. Which obviously has a huge effect on the actual results because there will be inequality in turnout changes among different groups.
I guess a simple question is: has the blended average been accurate in predicting past elections? It seems at some point you have to compare predictions to actual results to determine how good predictions are. My recollection is that these blended results have not been that accurate. But I am happy to be proven wrong.
What’s ”accurate”. If you are looking for 100% accuracy, it’s not going to happen. The whole point of probabilistic forecasting is that you can’t know for sure, but the chances of predicting the correct outcome are better than a coin flip. If you ever read Nate’s book The Signal and the Noise, you might recall a chapter on how an airline that makes a safe flight—without crashing—99% of the time is actually horrible odds and smart people would not fly in that airline. My point being that if a candidate has even a 20% chance of winning the presidency, it’s not as unlikely as people feel like it should be. So even useful polling over the long run, will miss the mark a good number of times. If it didn’t, it wouldn’t be forecasting, it would be telling us the future.
Nate isn't trying to determine their bias compared to the truth but rather their bias compared to the average. "Bias" is the wrong word - he usually calls it a "house effect" but went for the clickbait headline!
The reason for the adjustment isn't to make the polling better reflect reality, but to many the polling average more stable and less sensitive to which polls have come out recently. Consider we have two pollsters, A and B. A has a pro-Dem house effect and B has a pro-Rep effect. If we are averaging the polls in each week (which isn't what Nate, but let's pretend for simplicity) and this week there are two polls from A and one from B and last week there were two polls from B and one from A, then this week is probably going to be more pro-Dem than last week just because of which polls are included in the average. That's not helpful, so Nate adjusts the polls to remove that source of variation. Last week will have been adjusted towards Dem and this week will be adjusted towards Rep. That means you can compare last week to this week and see the actual change in the race without it being skewed by who released polls when.
This is super helpful. Do you exclude any pollsters? Is there a point which you drop a poll b/c it's just BS (one of my colleagues puts Morning Consult, Echelon, Big Village in the BS category). What's your experience with exclusion vs. averaging everything in the bucket?
In addition, we don't use polls from ActiVote, SoCal Polling or Quantus. In the former case, we don't think ActiVote meets the definition of a scientific poll. For the latter two, they violate a longstanding policy against DIY/nonprofessional pollsters using cheap online survey platforms like PollFish. So these polls aren't banned per se, and we wish them well, but they just don't meet our standards.
There are excluded polls, though I think that is mostly due to either bad methodology or secretive methodology. If they aren't public about how they do their polls, I don't think Nate trusts them.
A bunch of your links, Eli, are going to X, and it seems like trawling through X might be a bit demoralizing.
I would like to propose a new hire that may help you build a basic understanding of what's going on over there without getting too bogged down.
It's called.. a cat!
Hear me out, hear me out.
The strategy is, log in on X while you're acting like you don't want it on your keyboard. Most likely, your new feline co-worker will take this as an invitation to clamber right up there.
Whatever link you end up being able to see on the screen over its furry body once it's settled down is the one you take as representative of the overall discourse on X.
"You’ve got to feel sorry for the pollsters – they really are the new weather forecasters. They are charged with anticipating important phenomena over which they have no control, their audiences habitually misinterpret or misapply their data, and they suffer much blame and public abuse for their perceived failings.” - Herman's Toteboard (www.thetoteboard.org)
On the forecast page, under "Polls included in our model", I would find it useful to have an additional column for "Margin with House Effect".
For example, on today's page it lists a Quinnipiac poll that is R+0.5, an NYT poll that is even, an Echelon poll that is D+6, and a Rasmussen Poll that is R+2. It would be useful to see at a glance that these results are treated as Quinnipiac R+0.5, NYT D+0.1, and Rasmussen D+0.6.
I also don't see the house effect for Echelon on the table. Did I miss that or was that omitted?
It seems it’d make more sense to measure polling bias based on their errors in past elections, rather than their standing relative to the average (since the average could be biased.)
This shows up a lot in A/B test interpretations (bread and butter of my work). It's tempting to correct for errors, but those corrections require similar conditions. If you did something, like Amazon's quick buy feature, but you have A/B tests for purchasing behavior prior to that, even if it *shouldn't* overlap, you might still be drawing a different distribution of users.
Since pollsters correct for past behavior, they're likely modifying the distribution of poll responses they're drawing from. Maybe some are, maybe some aren't, but you want to make dead sure that past scenarios are equivalent to the current scenario when making bias adjustments.
Sometimes, it's better to be a little wrong all the time than playing whack a mole with every error and over fitting. Over fitting dramatically risks your predictive capabilities more than a biased offset.
But the purpose isn't to make the polls more accurate, but to make the polling average more stable. It means you don't have the polling average moving around just because there happen to have been lots of polls with a pro-Rep house effect last week and lots of polls with a pro-Dem house effect this week.
The way he makes it more accurate is to give a greater weight to pollsters that have historically been more accurate. That is based on the actual results.
I'm somewhat sceptical of the methodology explained here. It essentially seems to do the herding on behalf of the pollsters.
I think it makes a lot of sense ot weight and adjust pollsters based on historical statistical bias. I think it makes far less sense to weight pollsters based on their bias relative to the current average. If the current average is wrong, then the adjustments are wrong, and we have no way of knowing if the current averages are right or wrong. The way it's worded, it sounds like the model essentially takes the higher-rated polls as gospel and then adjusts everything else towards them, essentially herding on behalf of the pollsters. Which means garbage in, garbage out.
Isn't the entire point of polling averages supposed to be about negating bias and statistical noise? But this average seems to be weighting everything towards the bias of the higher-weighted polls, which on average may be unbiased, but cycle to cycle may be wildly off. Take Siena in 2020, whose final polls had Biden up 11 in Wisconsin and up 3 in Flordia. Maybe they're unbiased in the long-term, but weighting polls to such a massive miss in 2020 wouldn't have helped at all.
I have no issue with weighting and adjusting polls based on historical precedent, but I really struggle to understand the logic behind weighting and adjusting polls based on other polls in the same cycle. That seems to significantly undermine the entire purpose of taking weighted averages in the first place.
But what about scenarios when the higher-rated pollsters differ from the lower-weighted ones? We saw this in 2020 and to a lesser extend 2016. As this post says, this isn't an issue for this cycle, with high-rated pollsters and right-leaning pollsters telling a similar story, but it has been an issue in the recent past.
Eli & Nate - I don’t have a statistics background, but it seems like the iterative bias process is primarily reinforcing that the highly-rated pollsters have little bias (so circularly confirming the input).
Is there an assumption in all this that bias among pollsters has some kind of normal distribution? How would the average and house effects be different if Trafalgar simply didn’t exist (or was banned) and all other polls were the same? Or what if a new pollster with results similar to Trafalgar or Rasmussen started to be included in the average? I know you can only work with the data you have, but if the answer to those questions is that it moves the model, then I wonder how much the model can be said to truly forecast the election. I know, of course, that polling failures obviously impact the ability of the model to do what is intended, but are we really doing any better than GIGO with the hope that there is an appropriate mix of pollsters?
Polling aggregators put a pretty large amount of faith in the business model of pollsters to discourage them from synthesizing results.
As long as the new "similar to Trafalgar" pollster actually contacted different people and met the polling quality checks, it would make sense to throw them into the average.
That is the whole point of aggregation. It effectively increases the sample population, and therefor the accuracy of the total predicted result.
So I think the "are we really doing any better than GIGO" is answered by the historical analysis and refinement. There is a limit to what models of this type can do if there input is bad, but as I pointed out elsewhere, the bias correction cannot correct for a systemic bias across many pollsters. I think it is more aimed at preventing biases in individual polls from unduly affecting the projection. So if most of what is going in is garbage, then yes, that's what you will get on output. Maybe it's more of an outlier management tool, especially if outlier pollsters attempt to flood the data stream with frequent polls.
There are pollsters that seem exist to push polling averages towards Trump — see Quantus saying “you’re welcome” to their polls moving the NC average back towards Trump, and subsequently being discovered failing to disclose poll sponsors. Can you really “adjust your way out” of purposefully bad data? Shouldn’t firms with demonstrated transparency issues be considered fundamentally suspect?
I understand that the partisan adjustment means that these firms don’t impact the toplines much. But I’d argue they don’t teach us anything, either.
As long as they 1) use a valid scientific polling method, and 2) honestly report their results, then yes, they do add some information. And if Nate thinks they aren't doing those things, I think they get banned.
"In calculating house effects, we basically look at how a firm’s polls compare to the trendline of other polls from that state (or compared to other national polls). This involves an iterative process: we calculate the trendlines, then calculate the house effects based on the trendlines, then recalculate the trendlines with adjustments for house effects, then calculate a more refined version of house effects from the recalculated trendlines, and so on. In this process, the more highly rated, nonpartisan firms serve as essentially the center of gravity: the “true” values against which other firms are compared."
This seems to say that the house effect corrections depend on the broad polling averages being correct. If there is a systemic bias, then the trendlines of other polls will be wrong too. So house effects won't - and aren't designed to - correct for any systemic bias. I suppose doing that is just not possible anyway. So house effect corrections are designed to adjust biased individual pollsters, not systemic biases. Do I have that right?
That is basically my understanding. But the model has a feature that guesses the probability that this whole system is shifted by some percentage from the truth in one way or another.
I think the main purposes of the house effect bias is to account for the fact that polls come out in inconsistent frequencies from one pollster to the next. How do you find the “platonic ideal” of the aggregate average of the polls in an instantaneous moment, when it can be possible that, at times, most of the recent polls may lean in a certain direction? The house effect adjustment therefore makes the polling average more stable and less subject to day-to-day variance based simply on which pollsters happen to have released the most recent polls. I suspect wouldn’t need to account for house effect if say, every pollster in the model released a poll once every Friday, with the same survey date ranges. You could simply calculate a weighted average based on pollster grade, methodology, and sample size without any other adjustments.
That “platonic ideal average” of the polls may itself have systemic error in favor of one candidate, but historically that’s been basically impossible to predict ahead of time cycle-to-cycle because pollsters adjust their methodology to account for why they missed in a prior cycle.
Historical error is accounted for in the weighting for pollster grades, and I believe more recent cycles are weighed more heavily in that calculation, but pollster grade is ultimately designed to look at what pollsters have a track record of maintaining accuracy over the long-run.
While Nate's. sorta-forced, write up on "house effects" polling, is instructive, we should also look at the "house effects" on THIS site, the SB. Many of you are liberal, and God Bless you for that - It's all good, but we disagree. However, I respect your right to think and vote the way you want. But here, at the SB there clearly is a "house effect" INDEED, that permeates the whole site and it's the editorial of Nate himself, and how it communicates OVERT Harris-cheering through out. To be sure, if one where to synthesize MOST of the content of the SB, the tone and theme is..."Harris focused" - "She's doing well this morning", "You got this KaMALA"..."No need to panic about A, B and C, HARRIS issues they can ALL be explained", "Here's what Harris has to do to pull it out"...BUT THERE IS NO TRUMP-FOCUS ON THIS SITE, at all! Is that because everyone who knows Nate to be an very public liberal pollster and followed him over here from 538, did so understanding that it would be a blissful waterhole for agreeable, friendly, like-minded progressives - Woodstock for Harris Voter - Yeah!? Sorta seems that way...OTHERWISE you would find whole areas here where Trump supporters are congregated to discuss their very strongly held views as well, and maybe even debate them. But that is far from the experience here, and it then lends itself to the idea that SB is a site to report liberal polling, to liberals, so that liberals can feel great about the Harris campaign's prospects - unless they suck, and they sorta look that way outside of this "Bubble"...While this might seem jarring, controversial and even foreign to hear (I mean an opposing view) I suggest you open the site in the morning and pretend you are a disgruntle Trump voter looking for a new home, or a down the middle moderate looking to check it out and unfortunately you'll find that you really don't have a place here, due to its very intentional, I believe, progressive experience design (PED). Read through all of Nate's daily notes to people...It's almost like he is tucking his lil' liberal toddler into bed with hopeful wishes and dream, each day. Here, there is one audience in mind and one subject - Kamala is our person, here is our slant on HER race, and here is why we think you should not worry because the model says it's ALL GONA BE OK? Good luck with that...I paid my money too, and I need more balance and less of a Kamala cheering section. #Let'sGo47
It seems pretty clear to me that Nate, like many of us, feels that defeating the former president is of existential importance for the future of our country. He has always been transparent about his personal preferences, but the model is agnostic and has not changed from cycle to cycle. And I think it's safe to say that most of us are here because we want to get closer to the truth about what's happening in the country, and not just validate our confirmation bias. His editorial is not disturbing to me because I don't believe he lets it interfere with his fact-based analysis, and I've observed over the years that many send-professed right-leaning commenters agree.
“I paid my money too, and I need more balance and less of a Kamala cheering section.”
I get it; you don’t like liberals and/or progressives, and you don’t like Nate talking like one or, at least, you want “balance”. In fact, you paid for “balance” and got “a Kamala cheering section” – is that it? Well, you’re definitely in the wrong place. Which is not to say that Nate’s lecture on “house effects” is either “liberal” or “progressive (and hardly cheering Kamala). It was about selective poll aggregation, about how the model weights a pollster’s results. And, if Nate was a pollster himself, there might be a case to be made for “house effects” on whatever poll he conducted. I don’t know. Because he’s not a pollster. He’s a poll aggregator, who weights collected poll results, quite transparently, and produces what he hopes is an accurate projection. Well, that’s what I paid for, not an opinion forum which seems more like what you’re looking for.
One more thing...it's not about LIKE or DISLIKE...that's pretty simplistic. But I will say this...In general, Liberals think Republicans are "Mean and Dumb"...Republicans, however, just think Liberals are "Wrong"...It's a huge difference. The former is largely emotional, the latter is largely logical...whatever side a person falls on is their business...but that's my POV, held by many conservative observers...
Re-reading your post - saying "you're definitely in the wrong place definitely prove my point...Is that something that comes from someone who is looking to "reach across the aisle? Ah...no, not really. I ventured into foreign lands knowing there would be resistance to thing I posted, but that I would benefit and learn from the experience. You choose to stay among your liberal gaggle, which is effectively a SAFE SPACE...Who's the more brave, adventurous, and open-minded person? The person who exposes themselves to alternative views? Or the guys who turn off the lights and locks themselves in his room with his friends, shades drawn, when "strangers" appear in his neighborhood? The answer is obvious...It's ME!
I think you should try re-reading some of your posts here, but from the perspective of a reader who is actually from across the aisle. They're often pretty aggressive and combative from the jump, and do not come across as open minded, willing to have a reasonable discussion, or "reach across the aisle" the way you think they do. When you come in hot all the time, people are going to respond in kind. Cool it down and you'll get the exchanges and and exposure to alternative views you're looking for.
There are plenty of conservatives on here who post their opinions and debate liberal posters in good faith. Some of the flak they get is uncalled for, but they do frequently contribute a lot of interesting perspectives that challenge my way of thinking, which is great! I appreciate their presence here and definitely do not want another liberal safe space. Meanwhile, many of your posts seem like they're written just to rile people up and start an internet shouting match. They read just like the angry partisan discourse you'd find on twitter or truth or whatever, which is what many of us here are trying to get away from, and it sounds like you are too.
I'm not trying to antagonize you or get into an argument about "KAM-BAM" or whatever, I'm just trying to honestly and bluntly tell you what I think, and hopefully it's helpful to you.
Nate's weighting/discounting/factoring/regressing of aggregated polls makes him an UBER POLLSTER...And I'm looking for a polling forum that does not "lead the witness", as they...got it?
It seems that you’re concerned with Nate’s commentary, not his model and its results. He has been clear that he would not vote for Trump, but he hasn’t exactly been a Harris/Democrat/Progressive/Liberal cheerleader either. He has criticized a lot of things about how Harris/Democrats ran their campaign and Progressive/Liberal messaging. So, he has opinions (including favoring Harris) and shares them openly. He took a lot flak in the past when his model gave Trump higher odds than the conventional wisdom (2016, especially). I think he’s pretty transparent about his subjective opinion of the data vs objective data itself vs his personal preferences. Even when Harris supporters were on his case because they thought he should have made a mid-course adjustment to boost her standing in the model, he was firm that doing so would have been putting his thumb on the scale rather than the unbiased method of letting the model play it out as designed. The model is what it is and he reports what it spits out. His analytics have fairly earned, over multiple election cycles, the reputation that has caused us to spend our money to read his updates here.
With that said, Nate is obviously a smart guy and I’m sure he knows that most people who are interested in paying to follow his data-centric approach to politics prefer Harris over Trump. Framing his articles more about Harris than about Trump seems likely to be more engaging to more readers, so I wouldn’t be surprised if he wrote that way more often on purpose. Regardless, the data is the data. Nate writing about Harris instead of Trump doesn’t change anything.
If you just want to read more positive things about Trump, there are probably other venues that would suit you better than this data-centric site. It would be silly of me to go to Truth Social and wonder why more people don’t want to talk about how we can solve global warming, right?
We shouldn't know his politics or even have to need to speculate...and if we do know his politics then he should go out of his way to make his commentary and prose
Folks who spend time doing deep dives on and critiquing polls and models online for purported bias against their candidate might consider putting that time much more usefully toward volunteering for their candidate instead … just a thought
Or they should be doing deep dives and critiquing models of the roundness of the earth, or the effectiveness of vaccines. All this stuff is equally fun, and you can get pilled on lots of things!
And folks who spend time doing critiques of people doing deep dives of polls might consider putting their time elsewhere other than on sites dedicated to doing just that.
Some of us are here because we are interested in the material, and we might also do volunteer political work, as one does not preclude the other. I also go for walks in the park, as I dont do volunteer campaigning all my waking hours.
Eh, the shape of the critique matters. If someone is here postulating that we're gonna find out in about 5 weeks that "this" or "that" element was probably wrong, that seems like a good fit for this material.
If someone is here arguing that this model should change *now*, in order to match that person's idea of accuracy, because they think they know this or that thing that is wrong right now, rather than guess that we might find out that it is wrong, then that sounds like someone who isn't really equipped to understand what we're looking at here.
I have a feeling that OP is talking about the latter, as they seem to far outnumber the former. Perhaps that's an intuitive misapprehension - it may be that those people just get more engagement on their comments than the other, but genuinely it does feel like there are more people in here trying to re-model the model, rather than genuinely critique it.
If we are talking about polls, then you need to consider bias, and since this is a discussion group why not discuss bias.
I work for Harris, but as the polls had a deep under estimation of Trump in 2016 and 2020, failure to try to understand that might result in another under estimation as the demographic/political dynamics haven't changed a lot, where as bringing in 2012 is almost an ancient alignment.
I think there are studies showing that Trump supporters are less likely to answer polls, which could explain some of that discrepancy. Dont forget that 538 polls results had 8 total states wrong in the 2016 and 2020 presidentials, and granted the error was not always large except for maybe WI, all 8 states were called Blue and went Red, where eg RCP had 5 errors and 4 of 5 were blue calls going red and one the reverse.
There are plenty of those that deep dive and are perfectly capable of criticizing a poll that is good for "their" candidate. Especially when they come from a professional background. At the end of the day if that's your career you're graded on how good you are and not whether you made your candidate feel good
Polling, like economics, is a profession where you can be consistently wrong but survive so long as you aren't way off from the industry consensus.
I mean, you can grade polls based on how well they predict outcomes, so this framing strikes me as overly simplistic.
Which is where I am at. I support Harris, knock on doors, but I am wary that whatever dynamics caused the 2016 and 2020 Trump under estimations will be in play as the voter dynamics haven't shifted, and I dont see that the polling has shifted greatly.
This is unless the candidate has a biased perception of reality this would indeed be the best equalizer
Is it correct to assume that Trafalgar and Rasmussen (two of the most biased per this article) earn their relatively high scores (B+ and B respectively) because they are *consistently* and predictably biased, and therefore reasonably accurate once the "windage" is taken into account?
No. The ratings aren't based on adjusted results. The reason they are rated highly is because for the last two elections they've been right. The actual result was better for Republicans in both 2016 and 2020 than the average of the polls. A pro-Republican house effect is, therefore, what we would expect to see from an accurate polling company.
PS I should clarify, that's true for their historic house effect, not necessarily their current house effect. Pollsters that were more Republican in 2016 and 2020 will have been more accurate. That doesn't mean being more Republican this year will also make you more accurate - Nate's model assumes the average polling error is equally likely to be in either direction, since pollsters try to correct for past errors.
I asked three friends at lunch who they’re voting for, and we’re 3:1 for Harris. Please include this data point in the model. It’s a cinch to count to 4, so there’s zero uncertainty in my numbers.
Margin for error (sometimes called “uncertainty”) doesn’t mean that we are uncertain of what the people think - it’s our uncertainty about how well the average of our sample approximates the average of the population at large. Sampling from one person’s friends causes very non-representative results, and sampling from people who answer the phone the first time you call causes moderately non-representative results. Even if you are really doing totally random samples, smaller samples are more often unrepresentative than large ones. Reported margin-of-error is usually just based on sample size, and assumes true randomness of the sample, but the pollster weightings and house effects here try to make up for the rest.
[Pssst: It’s a joke! I’m a statistician.]
Margins of error in polling only capture the statistical error, i.e. how close the poll numbers are to the actual distribution given our sampling method. It doesn't say anything about how well the sampling method captures the real distribution in the population and especially real distribution among actual voters. That's the whole reason why polls disagree.
No way to know who the “actual voters” are until ballots are cast.
Margin of error as quoted is valid for samples taken randomly from a production line of supposedly identical manufactured widgets. It's less meaningful with humans, most of whom are defective anyway.
It doesn’t matter if the widgets are identical. In fact, if they’re identical; the survey should either yield a result of 100% or 0%. Statistical sampling, and margin for error, really is great for sampling a diverse and heterogenous population that is full of messy individuality - but only if you can really draw a random sample from it with uniform probability of sampling each individual.
Wait, I thought 2+2=4…surely that’s the correct answer and not 3+1? Uncertainty abounds!
It's a legitimate point about: more polling companies = more garbage data. To be honest I don't know how to solve this. My feeling is that companies with scarce history of ground truth or completely new companies or companies that seem to have adjusted their sampling in significant ways should be removed entirely. Perhaps they can be useful for seeing relative change. But their bias should be essentially set to unknown.
If they don't have a good track record, they'll be included in the average with a low weighting. Nate only excludes them entirely if there is reason to think they aren't even trying to produce reliable polling and are engaged in actual fraud.
Here is a basic problem. Too many pundits and partisans view polls as sympathetic magic. They believe that reporting polling in the desired direction means their desired candidate is going to win: that polling is not analysis but something that causes voting.
It's a little more complicated than that. Enthusiasm is a real thing in elections, high enthusiasm drives turn out, low enthusiasm suppresses it, and published polling *is assumed* to have an effect on enthusiasm - and I imagine, under some circumstances, it does.
The size of that effect, and the circumstances that might modulate the size of that effect, and even the *direction* of the effect and circumstances that might affect that direction, are at best poorly understood, so I fully agree with you that it's a problem, and perhaps indistinguishable from "sympathetic magic". There's at least a decent reason for people to think it, but no good reason for people to assume it's actually impactful, as they do.
In particular, it would seem like a *close* polling result would drive up enthusiasm across the board, regardless of who is winning, and a not close result would suppress it, possibly asymmetrically. But, as said, poorly understood, that's just a guess.
BRAVO!!! Been saying this and getting RAILED the the comrades...
What’s your evidence for this? In my experience, consumers are hungry for all sorts of data. Reporting a poll of someone’s preferred candidate being down draws eyeballs too. People consume hope and tragedy. Arguably, more tragedy.
When you see people complaining the polls are "rigged" and post comments to this effect with "#deepstate" that indicates that they believe that polling influences voting outcomes.
I *think* their working theory is that polling showing a candidate leading by a reasonable margin leads to lower turnout for those who would otherwise vote for the other candidate.
I think this is a load of crap personally, but that's what they'd have you believe.
I have assumed claims of polling rigging to potentially be precursors for claims that the election itself was stolen. If polls were showing Harris leading significantly when it is close to the election and people generally believed they were fair, Trump’s claims of the election being stolen from him would be harder to spread outside of his true believers. If his campaign has already convinced people that pre-election polls showing him losing were flawed, denial of the election result would be easier to sell.
Perhaps that's it.
I’ve heard that theory before. I haven’t seen evidence to support or refute it. But, assuming it’s true, I would think that poll watchers who favor Trump would be happy to pollsters they see as left leaning lull Harris-inclined voters into a false sense of security. But I’m seeing the opposite. As Harris gets positive poll results, I see anti-Harris poll watchers posting that the polls are impossibly flawed and shouldn’t be included in Nate’s forecasting.
Which leads me to believe that another, opposite, theory is more likely to be true—people complaining about Harris’s polling seem to believe that voters can be pursued to jump on the bandwagon of the more popular candidate. Kind of like fair weathers fans rooting for the number one ranked team in a given season. *I haven’t seen any evidence for or against this idea either.
I don’t understand how you can determine how a pollster is biased if you don’t have “ground truth”? I would think you could compare how a pollster polled just before, say the 2022, election, and compare this to the actual results. It seems that comparing to the mean is meaningless if the mean is off (as in “garbage in, garbage out“).
Start with an assumption that no polls are biased, and then generate an average. From that average, assign a bias score to every poll based on how far off the average they are. Then over time, look at newly released polls from every firm, adjust based on the bias and compare that to the average. Adjust the bias based on how far of THAT is from the average. By iterating this over multiple cycles of polls and averaging you end up with stable biased scores. Polling firms that are just noisy (miss the average in both directions) will settle out with a low bias, while polling firms that always miss the average in the same direction will get a correlated score assigned.
That gives you a reliable table of adjustments against the industry average. The other way you can score the reliability of this kind of model is examining past results. It's as simple as "If we look at every election where a candidate had a 70% chance to win, did 70% win and 30% lose?" If the answer to that is generally yes across all percentages then you can be confident that you have a well built model. If it's no, then you have to try to figure out what information you aren't accounting for.
“Start with an assumption that no polls are biased”
But maybe they are biased. For instance, I don’t see where polls correct for the fact that women vote ~10% more than men. Polls I believe always poll equal number of men and women. Since women poll ~20 pts towards Harris, won’t this have a built-in bias? This seems too obvious to have been overlooked by all pollsters; but maybe there are other sources of systematic (unintentional) bias.
Not true. Look at the crosstabs.
Women are almost always weighted more.
Eg, the most recent NYT poll has women at 52% vs men at 47%.
Some polls might not get this right - but it is standard practice for high quality pollsters.
That wasn’t meant to be a true assumption, in fact it would be impossible to be true since that would mean all polls are identical. The point of starting there is that you don’t know which polls are biased.
You back test against past elections (not just presidential elections, all polls) to identify which polls are reliable and weight those higher. If there was a systemic bias in polling it would show up in this kind of analysis, if it was industry wide it would show up in the models as well. People can and have tested for it.
Some people think there is a specific bias just against Trump, but that’s based on a sample size of 2, which isn’t really something to reliably take action over.
Finally, pollsters and model builders have a strong incentive to find and fix mistakes/bias, because their reputation depends on reliability. Anyone who deliberately ignored it would fall off the radar as unreliable.
“You back test against past elections (not just presidential elections, all polls) to identify which polls are reliable and weight those higher.”
I guess I haven’t seen this. But I’d like to see if they are historically biased (ie, not accurate) and if so, by how much. And do these numbers line up with the “bias” values used in the Silver model.
https://fivethirtyeight.com/features/how-fivethirtyeights-2020-forecasts-did-and-what-well-be-thinking-about-for-2022/
Here’s Nate’s report on the 2020 election where he compares expectations to results. A good model will look at these regularly as evidence of if the model needs changes or not.
Thanks. This is useful. I notice that in 2020 congressional races, the model predicted incorrectly towards the D. I’m wondering if this year the model has been over-tweaked in the opposite direction…
By "start with an assumption that..." Often shows up in Bayesian modeling. Basically, you start somewhere, run an estimation, update that assumption, run another, and so on. That's a bit over simplified, but it's called a prior distribution and a posterior distribution. Another example is that if you have a potentially biased coin, you can start from assuming it's fair, but through iterations you can modify that assumption to close in on actual values (with a degree of error).
The long term trend is that previous elections have pretty large prediction errors also.
Especially comparing mid-terms to presidential elections.
So if one is going to assume that the blended average represents the real state of the electorate, it is pretty reasonable to use this to calculate a present day statistical bias factor.
If someone doesn't believe the blended average is useful, then perhaps polling aggregation methodology just isn't their cup of tea.
That would assume that pollsters don't try to fix their own accuracy, something that's definitely not true. There might be factors beyond quality but I don't think pollsters just try to relive 2016 over and over again for example. There is very little ground truth available and it's a real problem. Pollsters adjusting their sampling in unpredictable ways perhaps contrary to their bias history or just more pollsters entering the arena can easily overwhelm a model and cause huge persistent errors.
No - it is that pollsters adjust their models and the models are still "wrong", as in the error term is near the limit for the 95% confidence number usually quoted.
https://fivethirtyeight.com/features/the-death-of-polling-is-greatly-exaggerated/
Look at the data in the first table in that article. A "good" year is a 3.2% error.
The 95% confidence has nothing to do with it. 95% is for how well our gathered numbers reflect the actual distribution given our sampling method. E.g. if we do a phone poll and our definition of LV is xyz then we're 95% confident that our gathered numbers are within the margin of the actual result i.e. if we took the entire sample and asked them the same questions and calculated the results. BUT it says nothing of: how well does a definition of LV actually capture voting behavior, whether people lie, whether there is some inherent bias in data gathering process that isn't corrected for, etc. That's the whole reason why actual results often fall outside of the range.
Umm, what?
The confidence interval indicates how well the sample represents the full dataset.
It necessarily represents the data filtering decisions the pollster makes in addition to more classic sampling errors.
It is a reasonable metric to evaluate how well pollsters do their job of finding and filtering responses.
They are off by more than a standard deviation in a sample population near 1000 people.
You can see this is not true by filling in the numbers in an online calculator that has nothing to do with presidential polling, e.g. here https://www.surveymonkey.com/mp/margin-of-error-calculator/
For a large population, sample of 1000 and 95% confidence means 3%. That's just "if we sent the questionnaire to everyone and everyone answered assuming no bias in our sampling method" what would be the mathematical error just emerging from randomness. Has nothing to do with whether your chosen mode of conducting a survey is representative of the actual distribution among people who actually vote in a given year. That's why it's so difficult to for example predict the turnout. Which obviously has a huge effect on the actual results because there will be inequality in turnout changes among different groups.
I guess a simple question is: has the blended average been accurate in predicting past elections? It seems at some point you have to compare predictions to actual results to determine how good predictions are. My recollection is that these blended results have not been that accurate. But I am happy to be proven wrong.
https://fivethirtyeight.com/features/the-death-of-polling-is-greatly-exaggerated/
Of course the model isn't accurate - it can't be.
The question you should be asking is whether model is accurate enough.
And that leads to asking "accurate enough for what?"
A sense of how the campaigns are progressing? Absolutely.
A reliable indicator of the eventual margin of victory? Probably not if you want to arbitrage the betting market.
A reliable indicator of the likely winner? Perhaps, as long as you don't define reliable as certain.
What’s ”accurate”. If you are looking for 100% accuracy, it’s not going to happen. The whole point of probabilistic forecasting is that you can’t know for sure, but the chances of predicting the correct outcome are better than a coin flip. If you ever read Nate’s book The Signal and the Noise, you might recall a chapter on how an airline that makes a safe flight—without crashing—99% of the time is actually horrible odds and smart people would not fly in that airline. My point being that if a candidate has even a 20% chance of winning the presidency, it’s not as unlikely as people feel like it should be. So even useful polling over the long run, will miss the mark a good number of times. If it didn’t, it wouldn’t be forecasting, it would be telling us the future.
Nate isn't trying to determine their bias compared to the truth but rather their bias compared to the average. "Bias" is the wrong word - he usually calls it a "house effect" but went for the clickbait headline!
The reason for the adjustment isn't to make the polling better reflect reality, but to many the polling average more stable and less sensitive to which polls have come out recently. Consider we have two pollsters, A and B. A has a pro-Dem house effect and B has a pro-Rep effect. If we are averaging the polls in each week (which isn't what Nate, but let's pretend for simplicity) and this week there are two polls from A and one from B and last week there were two polls from B and one from A, then this week is probably going to be more pro-Dem than last week just because of which polls are included in the average. That's not helpful, so Nate adjusts the polls to remove that source of variation. Last week will have been adjusted towards Dem and this week will be adjusted towards Rep. That means you can compare last week to this week and see the actual change in the race without it being skewed by who released polls when.
This is super helpful. Do you exclude any pollsters? Is there a point which you drop a poll b/c it's just BS (one of my colleagues puts Morning Consult, Echelon, Big Village in the BS category). What's your experience with exclusion vs. averaging everything in the bucket?
There are various pollsters on our permanently banned list.
https://www.natesilver.net/p/pollster-ratings-silver-bulletin
In addition, we don't use polls from ActiVote, SoCal Polling or Quantus. In the former case, we don't think ActiVote meets the definition of a scientific poll. For the latter two, they violate a longstanding policy against DIY/nonprofessional pollsters using cheap online survey platforms like PollFish. So these polls aren't banned per se, and we wish them well, but they just don't meet our standards.
There are excluded polls, though I think that is mostly due to either bad methodology or secretive methodology. If they aren't public about how they do their polls, I don't think Nate trusts them.
Thanks Dean.
A bunch of your links, Eli, are going to X, and it seems like trawling through X might be a bit demoralizing.
I would like to propose a new hire that may help you build a basic understanding of what's going on over there without getting too bogged down.
It's called.. a cat!
Hear me out, hear me out.
The strategy is, log in on X while you're acting like you don't want it on your keyboard. Most likely, your new feline co-worker will take this as an invitation to clamber right up there.
Whatever link you end up being able to see on the screen over its furry body once it's settled down is the one you take as representative of the overall discourse on X.
Do this multiple times if you need more links.
"You’ve got to feel sorry for the pollsters – they really are the new weather forecasters. They are charged with anticipating important phenomena over which they have no control, their audiences habitually misinterpret or misapply their data, and they suffer much blame and public abuse for their perceived failings.” - Herman's Toteboard (www.thetoteboard.org)
On the forecast page, under "Polls included in our model", I would find it useful to have an additional column for "Margin with House Effect".
For example, on today's page it lists a Quinnipiac poll that is R+0.5, an NYT poll that is even, an Echelon poll that is D+6, and a Rasmussen Poll that is R+2. It would be useful to see at a glance that these results are treated as Quinnipiac R+0.5, NYT D+0.1, and Rasmussen D+0.6.
I also don't see the house effect for Echelon on the table. Did I miss that or was that omitted?
It seems it’d make more sense to measure polling bias based on their errors in past elections, rather than their standing relative to the average (since the average could be biased.)
This shows up a lot in A/B test interpretations (bread and butter of my work). It's tempting to correct for errors, but those corrections require similar conditions. If you did something, like Amazon's quick buy feature, but you have A/B tests for purchasing behavior prior to that, even if it *shouldn't* overlap, you might still be drawing a different distribution of users.
Since pollsters correct for past behavior, they're likely modifying the distribution of poll responses they're drawing from. Maybe some are, maybe some aren't, but you want to make dead sure that past scenarios are equivalent to the current scenario when making bias adjustments.
Sometimes, it's better to be a little wrong all the time than playing whack a mole with every error and over fitting. Over fitting dramatically risks your predictive capabilities more than a biased offset.
But the purpose isn't to make the polls more accurate, but to make the polling average more stable. It means you don't have the polling average moving around just because there happen to have been lots of polls with a pro-Rep house effect last week and lots of polls with a pro-Dem house effect this week.
The way he makes it more accurate is to give a greater weight to pollsters that have historically been more accurate. That is based on the actual results.
I'm somewhat sceptical of the methodology explained here. It essentially seems to do the herding on behalf of the pollsters.
I think it makes a lot of sense ot weight and adjust pollsters based on historical statistical bias. I think it makes far less sense to weight pollsters based on their bias relative to the current average. If the current average is wrong, then the adjustments are wrong, and we have no way of knowing if the current averages are right or wrong. The way it's worded, it sounds like the model essentially takes the higher-rated polls as gospel and then adjusts everything else towards them, essentially herding on behalf of the pollsters. Which means garbage in, garbage out.
Isn't the entire point of polling averages supposed to be about negating bias and statistical noise? But this average seems to be weighting everything towards the bias of the higher-weighted polls, which on average may be unbiased, but cycle to cycle may be wildly off. Take Siena in 2020, whose final polls had Biden up 11 in Wisconsin and up 3 in Flordia. Maybe they're unbiased in the long-term, but weighting polls to such a massive miss in 2020 wouldn't have helped at all.
I have no issue with weighting and adjusting polls based on historical precedent, but I really struggle to understand the logic behind weighting and adjusting polls based on other polls in the same cycle. That seems to significantly undermine the entire purpose of taking weighted averages in the first place.
But what about scenarios when the higher-rated pollsters differ from the lower-weighted ones? We saw this in 2020 and to a lesser extend 2016. As this post says, this isn't an issue for this cycle, with high-rated pollsters and right-leaning pollsters telling a similar story, but it has been an issue in the recent past.
Eli & Nate - I don’t have a statistics background, but it seems like the iterative bias process is primarily reinforcing that the highly-rated pollsters have little bias (so circularly confirming the input).
Is there an assumption in all this that bias among pollsters has some kind of normal distribution? How would the average and house effects be different if Trafalgar simply didn’t exist (or was banned) and all other polls were the same? Or what if a new pollster with results similar to Trafalgar or Rasmussen started to be included in the average? I know you can only work with the data you have, but if the answer to those questions is that it moves the model, then I wonder how much the model can be said to truly forecast the election. I know, of course, that polling failures obviously impact the ability of the model to do what is intended, but are we really doing any better than GIGO with the hope that there is an appropriate mix of pollsters?
Polling aggregators put a pretty large amount of faith in the business model of pollsters to discourage them from synthesizing results.
As long as the new "similar to Trafalgar" pollster actually contacted different people and met the polling quality checks, it would make sense to throw them into the average.
That is the whole point of aggregation. It effectively increases the sample population, and therefor the accuracy of the total predicted result.
LOVELY ANALYSIS...Nate needs to be pushed...after all...He's the GURU!
So I think the "are we really doing any better than GIGO" is answered by the historical analysis and refinement. There is a limit to what models of this type can do if there input is bad, but as I pointed out elsewhere, the bias correction cannot correct for a systemic bias across many pollsters. I think it is more aimed at preventing biases in individual polls from unduly affecting the projection. So if most of what is going in is garbage, then yes, that's what you will get on output. Maybe it's more of an outlier management tool, especially if outlier pollsters attempt to flood the data stream with frequent polls.
There are pollsters that seem exist to push polling averages towards Trump — see Quantus saying “you’re welcome” to their polls moving the NC average back towards Trump, and subsequently being discovered failing to disclose poll sponsors. Can you really “adjust your way out” of purposefully bad data? Shouldn’t firms with demonstrated transparency issues be considered fundamentally suspect?
I understand that the partisan adjustment means that these firms don’t impact the toplines much. But I’d argue they don’t teach us anything, either.
In a reply to a similar question, Nate said that Quantus is banned.
As long as they 1) use a valid scientific polling method, and 2) honestly report their results, then yes, they do add some information. And if Nate thinks they aren't doing those things, I think they get banned.
"In calculating house effects, we basically look at how a firm’s polls compare to the trendline of other polls from that state (or compared to other national polls). This involves an iterative process: we calculate the trendlines, then calculate the house effects based on the trendlines, then recalculate the trendlines with adjustments for house effects, then calculate a more refined version of house effects from the recalculated trendlines, and so on. In this process, the more highly rated, nonpartisan firms serve as essentially the center of gravity: the “true” values against which other firms are compared."
This seems to say that the house effect corrections depend on the broad polling averages being correct. If there is a systemic bias, then the trendlines of other polls will be wrong too. So house effects won't - and aren't designed to - correct for any systemic bias. I suppose doing that is just not possible anyway. So house effect corrections are designed to adjust biased individual pollsters, not systemic biases. Do I have that right?
That is basically my understanding. But the model has a feature that guesses the probability that this whole system is shifted by some percentage from the truth in one way or another.
I think the main purposes of the house effect bias is to account for the fact that polls come out in inconsistent frequencies from one pollster to the next. How do you find the “platonic ideal” of the aggregate average of the polls in an instantaneous moment, when it can be possible that, at times, most of the recent polls may lean in a certain direction? The house effect adjustment therefore makes the polling average more stable and less subject to day-to-day variance based simply on which pollsters happen to have released the most recent polls. I suspect wouldn’t need to account for house effect if say, every pollster in the model released a poll once every Friday, with the same survey date ranges. You could simply calculate a weighted average based on pollster grade, methodology, and sample size without any other adjustments.
That “platonic ideal average” of the polls may itself have systemic error in favor of one candidate, but historically that’s been basically impossible to predict ahead of time cycle-to-cycle because pollsters adjust their methodology to account for why they missed in a prior cycle.
Historical error is accounted for in the weighting for pollster grades, and I believe more recent cycles are weighed more heavily in that calculation, but pollster grade is ultimately designed to look at what pollsters have a track record of maintaining accuracy over the long-run.
While Nate's. sorta-forced, write up on "house effects" polling, is instructive, we should also look at the "house effects" on THIS site, the SB. Many of you are liberal, and God Bless you for that - It's all good, but we disagree. However, I respect your right to think and vote the way you want. But here, at the SB there clearly is a "house effect" INDEED, that permeates the whole site and it's the editorial of Nate himself, and how it communicates OVERT Harris-cheering through out. To be sure, if one where to synthesize MOST of the content of the SB, the tone and theme is..."Harris focused" - "She's doing well this morning", "You got this KaMALA"..."No need to panic about A, B and C, HARRIS issues they can ALL be explained", "Here's what Harris has to do to pull it out"...BUT THERE IS NO TRUMP-FOCUS ON THIS SITE, at all! Is that because everyone who knows Nate to be an very public liberal pollster and followed him over here from 538, did so understanding that it would be a blissful waterhole for agreeable, friendly, like-minded progressives - Woodstock for Harris Voter - Yeah!? Sorta seems that way...OTHERWISE you would find whole areas here where Trump supporters are congregated to discuss their very strongly held views as well, and maybe even debate them. But that is far from the experience here, and it then lends itself to the idea that SB is a site to report liberal polling, to liberals, so that liberals can feel great about the Harris campaign's prospects - unless they suck, and they sorta look that way outside of this "Bubble"...While this might seem jarring, controversial and even foreign to hear (I mean an opposing view) I suggest you open the site in the morning and pretend you are a disgruntle Trump voter looking for a new home, or a down the middle moderate looking to check it out and unfortunately you'll find that you really don't have a place here, due to its very intentional, I believe, progressive experience design (PED). Read through all of Nate's daily notes to people...It's almost like he is tucking his lil' liberal toddler into bed with hopeful wishes and dream, each day. Here, there is one audience in mind and one subject - Kamala is our person, here is our slant on HER race, and here is why we think you should not worry because the model says it's ALL GONA BE OK? Good luck with that...I paid my money too, and I need more balance and less of a Kamala cheering section. #Let'sGo47
It seems pretty clear to me that Nate, like many of us, feels that defeating the former president is of existential importance for the future of our country. He has always been transparent about his personal preferences, but the model is agnostic and has not changed from cycle to cycle. And I think it's safe to say that most of us are here because we want to get closer to the truth about what's happening in the country, and not just validate our confirmation bias. His editorial is not disturbing to me because I don't believe he lets it interfere with his fact-based analysis, and I've observed over the years that many send-professed right-leaning commenters agree.
*self-professed
I believe you believe that, Sugar Daddy! "Most of US here"...What?!? Point proven!!! What else you wanna discuss?
"Keep movin' folks...Nothing to se here... No group think going on at SB...just keep movin'"....
“I paid my money too, and I need more balance and less of a Kamala cheering section.”
I get it; you don’t like liberals and/or progressives, and you don’t like Nate talking like one or, at least, you want “balance”. In fact, you paid for “balance” and got “a Kamala cheering section” – is that it? Well, you’re definitely in the wrong place. Which is not to say that Nate’s lecture on “house effects” is either “liberal” or “progressive (and hardly cheering Kamala). It was about selective poll aggregation, about how the model weights a pollster’s results. And, if Nate was a pollster himself, there might be a case to be made for “house effects” on whatever poll he conducted. I don’t know. Because he’s not a pollster. He’s a poll aggregator, who weights collected poll results, quite transparently, and produces what he hopes is an accurate projection. Well, that’s what I paid for, not an opinion forum which seems more like what you’re looking for.
One more thing...it's not about LIKE or DISLIKE...that's pretty simplistic. But I will say this...In general, Liberals think Republicans are "Mean and Dumb"...Republicans, however, just think Liberals are "Wrong"...It's a huge difference. The former is largely emotional, the latter is largely logical...whatever side a person falls on is their business...but that's my POV, held by many conservative observers...
Re-reading your post - saying "you're definitely in the wrong place definitely prove my point...Is that something that comes from someone who is looking to "reach across the aisle? Ah...no, not really. I ventured into foreign lands knowing there would be resistance to thing I posted, but that I would benefit and learn from the experience. You choose to stay among your liberal gaggle, which is effectively a SAFE SPACE...Who's the more brave, adventurous, and open-minded person? The person who exposes themselves to alternative views? Or the guys who turn off the lights and locks themselves in his room with his friends, shades drawn, when "strangers" appear in his neighborhood? The answer is obvious...It's ME!
I think you should try re-reading some of your posts here, but from the perspective of a reader who is actually from across the aisle. They're often pretty aggressive and combative from the jump, and do not come across as open minded, willing to have a reasonable discussion, or "reach across the aisle" the way you think they do. When you come in hot all the time, people are going to respond in kind. Cool it down and you'll get the exchanges and and exposure to alternative views you're looking for.
There are plenty of conservatives on here who post their opinions and debate liberal posters in good faith. Some of the flak they get is uncalled for, but they do frequently contribute a lot of interesting perspectives that challenge my way of thinking, which is great! I appreciate their presence here and definitely do not want another liberal safe space. Meanwhile, many of your posts seem like they're written just to rile people up and start an internet shouting match. They read just like the angry partisan discourse you'd find on twitter or truth or whatever, which is what many of us here are trying to get away from, and it sounds like you are too.
I'm not trying to antagonize you or get into an argument about "KAM-BAM" or whatever, I'm just trying to honestly and bluntly tell you what I think, and hopefully it's helpful to you.
Nate's weighting/discounting/factoring/regressing of aggregated polls makes him an UBER POLLSTER...And I'm looking for a polling forum that does not "lead the witness", as they...got it?
It seems that you’re concerned with Nate’s commentary, not his model and its results. He has been clear that he would not vote for Trump, but he hasn’t exactly been a Harris/Democrat/Progressive/Liberal cheerleader either. He has criticized a lot of things about how Harris/Democrats ran their campaign and Progressive/Liberal messaging. So, he has opinions (including favoring Harris) and shares them openly. He took a lot flak in the past when his model gave Trump higher odds than the conventional wisdom (2016, especially). I think he’s pretty transparent about his subjective opinion of the data vs objective data itself vs his personal preferences. Even when Harris supporters were on his case because they thought he should have made a mid-course adjustment to boost her standing in the model, he was firm that doing so would have been putting his thumb on the scale rather than the unbiased method of letting the model play it out as designed. The model is what it is and he reports what it spits out. His analytics have fairly earned, over multiple election cycles, the reputation that has caused us to spend our money to read his updates here.
With that said, Nate is obviously a smart guy and I’m sure he knows that most people who are interested in paying to follow his data-centric approach to politics prefer Harris over Trump. Framing his articles more about Harris than about Trump seems likely to be more engaging to more readers, so I wouldn’t be surprised if he wrote that way more often on purpose. Regardless, the data is the data. Nate writing about Harris instead of Trump doesn’t change anything.
If you just want to read more positive things about Trump, there are probably other venues that would suit you better than this data-centric site. It would be silly of me to go to Truth Social and wonder why more people don’t want to talk about how we can solve global warming, right?
We shouldn't know his politics or even have to need to speculate...and if we do know his politics then he should go out of his way to make his commentary and prose
RIGHT DOWN THE MIDDLE...EVEN STEVEN...
I wonder if all the people who comment on the how wrong the forecast is will bother to read and, more importantly, comprehend this post.