92 Comments

As long as you aren't having to to type a sentence like “the model doesn’t yet fully account for the Chinese attack on Taiwan” I'll be good.

The temptation for our rivals to pull something big on us at this point must be huge.

Expand full comment

I'm triggered. howmanysims should be how_many_sims to be consistent with the naming convention of the other variables. STOP EVERYTHING AND FIX IT NOW

Expand full comment

I'm 50/50 on whether it's a pseudocode screenshot for giggles. It's hard to imagine that kamala_mode isn't 0/1-valued.

Expand full comment

Stata doesn’t have Boolean data types, so you’d use a scalar with allowed values of 0|1.

My bet is that this is a stata front end for easily adjusting model params, doing exploratory data analysis, and some straightforward stats models. Then, you can feed parameterized data and stats into the actual ML pipeline kind of like a config file.

Expand full comment

Alas, I am but a simple Fortran monkey bashing zeros and ones my together.

Expand full comment

As a chemist who saw many a friend drown in Fortran-land, respect.

Expand full comment

you'd be shocked and thrilled how easy it is to work with strings in other languages.

for example, it is easy to read a UTF-8 text file into a string, process that string using regular expressions, and then write it to a text file. this usually requires at most one or two lines of code.

Expand full comment

“Process that string using regex”

1-2 lines, 2-4 hours haha (at least for me)

Expand full comment

Fortran 77 was a huge improvement over Fortran IV. Pretty good with strings.

Expand full comment

I am but a young blossom and my standard is f90. Lords be praised.

Expand full comment

Wrote Fortran for years (retired now).

Expand full comment

I highly doubt it. This isn’t a good application for ML. It makes more sense to have more control over your model.

Expand full comment

could be, but those would be some pretty elaborate comments for pseudocode

Expand full comment

Have to admit I dwelled on that for several seconds when I scanned the code snippet.

Expand full comment

Personally I'd go with sim_count or total_sims if I were doing underscores.

Expand full comment

"The model does not yet take into account the impact of the regional war spawned by the firebombing of Beirut shortly after Turkey's inexplicable amphibious assault on Israel, nor the annexation of Taiwan or the ground invasion by Kim into South Korea, but we think these numbers are pretty good all things considered" - Tuesday, probably.

Lets turn those coconuts into piña coladas because we're gonna need it. . .

Expand full comment

🛸👽Maybe they’ll finally decide to take Trump back.

Expand full comment

Does the model differentiate between polls testing Harris v Trump prior to Joe Biden leaving the race and those conducted after 7/21? If not, would the model's results look significantly different if those earlier polls were excluded? The logic being that voters may respond differently when imagining a theoretical candidate, as opposed to when that person is a "real" candidate.

Expand full comment

I give Harris a 25% chance, here’s why. Harris is down by 1.7 in the national polling average, we don’t really have great swing state data so no need to do a spreadsheet. Harris needs to win by 1.8 to be even money in the electoral college. Standard error (combination of polling error and potential for changes in support) right now is about 5 points, so she needs to over-perform the mean by about 0.7 standard deviations to win the electoral college. Use a t distribution with 13 degrees of freedom (sample of 14 elections with modern polling) and that gives her a 25% chance of winning the election.

Nate’s models are generally less confident than my back of the envelope models, so revert the that figure to 50% by 1/3. Nate will predict Harris has a 33% chance.

Note that her polling is about as bad as Biden’s was before the debate. Nate’s model should land on a number similar to Biden’s pre debate number.

Expand full comment

I like your methodology, but I believe that your inputs are flawed--specifically Harris' national polling average. So I disagree with your conclusion. We shall see tomorrow though.

Expand full comment

That 1.7 is RCP's average as of today I believe.

Expand full comment

Sure, but nearly 50% of the polls that factor into that average were completely or partially performed before Harris was an actual candidate. Also some rough back of the napkin math will tell you that RCP does not take into consideration Rasmussen’s polling bias, so that +7 is factoring in more than it ought to be. The average should favor Trump, but I believe that it ought to be closer to even than +1.7 Trump. Crap in, crap out.

Expand full comment

If you look only at the polls included by RCP that starting sample 7/22 or after, Trump leads by 1.6. If you throw out the best poll for each candidate, Trump leads by 0.8, that gives Harris a 31% chance, which is about where I think Nate will land.

However, I’m not really comfortable adjusting the polling average in favor of Harris when Trump outperformed the polling average two elections in a row.

Expand full comment

I’m not suggesting you throw any of the polls out. I’m suggesting that approach polls with caution, like Nate will do. This includes considering that nearly 50% of the polls in the RCP were from before Harris was a candidate, and it includes considering the bias of each poll. The Rasmussen +7 poll was just an example, but the rest of the polls also have some bias one way or the other. It is my strong belief that tomorrow Nate’s average will be closer to even than +1.7 Trump. I am prepared to accept that I could also be completely wrong, but I’m glad I won’t have to wait too long to find out. I would take the other side of your bet every day of the week and twice on Sundays that Harris will have more than a 31% chance upon model open due to there being more uncertainty now and knowing how much the polling average had to be in Trump’s favor in Trump v Biden for Biden’s chances to fall to only 31%.

Expand full comment

I would like to memorialize that the initial model shows probability of 61/38.

Expand full comment

I just took it from RCP

Expand full comment

I think there's a 90% chance that Nate gives Harris a better than 33% chance. I predict 62/38 for Trump.

Expand full comment

You were spot on. Nice job!

Expand full comment

Biden was around 32% before the debate when he was down about a point in the polling average. Why would Harris be doing better? She has four weeks less to close the gap and her polling isn’t any better

Expand full comment

I thought some state polls were showing her in a dead heat in the Midwest and national polls showing her within a point or so on average as of now. Maybe seeing them all aggregated will make it more clear but I thought she was in a little better spot today than Biden pre-debate.

Expand full comment

Looking just at post-dropout polls, Harris is down 0.5 in MI and WI and down 1 in PA. She is down more than a standard deviation in NV and AZ. Her bright spot is GA, where she is down 1.5, which is 2-3 points better than Biden. Winning Georgia would let her lose MI or WI but not PA. That gives her basically a 46% chance in MI and WI, a 42% chance in PA and a 37% chance in Georgia. If those states were perfectly correlated, her odds of winning the electoral college would be 42%. If they are completely independent, her odds would be 17%. Given that three of the states are midwestern with similar demographics and losing PA basically knocks Harris out, I would be comfortable weighing the perfectly correlated figure 0.6 and the independent figure 0.4, that gives her odds of 32%, right around where Nate is likely to land.

Expand full comment

This also shows how Nate’s fancy correlation matrices are a useful refinement. If you split the difference between complete independence and complete correlation, her odds are 24%. If you go 2/3 of the way between complete independence and complete correlation they are 33%. Weighing complete correlation 0.6 won’t be off by more than a few points, Georgia is different enough from the midwestern states that the correct coefficient is pretty clearly in that range.

Expand full comment

Yes, there are weird patterns where she wins AZ and doesn’t need WI, but there are also weird patterns where she loses VA or MN. The weirdness will roughly cancel out and really isn’t worth doing until there are more polls and you can do a formal model.

Expand full comment

There was never a day where Biden was at 32%. He was at 31% both on June 30 and July 17, and he was down 2.3% and 2.0%, respectively. Your premise that her polling isn’t any better is flawed, and the data will bear that out. You have failed to consider that while Trump v Biden was quite stable, Trump v Harris is relatively unstable, and the model will likely revert closer to even odds at whatever her polling average ends up being than where Biden was. Emphasis on “closer to”.

Expand full comment

ps— after looking at the swing state polls, I would be more comfortable giving Harris a 29% chance.

Expand full comment

ps— one assumption I made is that the number of times Harris loses the EC while winning the popular vote by more than 1.8 points equals the number of times she wins the EC while winning less than 1.8%. That rule of thumb begins to break down if she’s ahead or behind by more than a standard deviation, but her t score is between 0.5 and 0.8 so that so my rule of thumb will work pretty well and there’s no need to do a spreadsheet until we have better swing state numbers

Expand full comment

Nothing is pre-determined- the outcome of the election is down to the will annd motivation of the individual voters and volunteers.

Expand full comment

You offer a vacuous tautology. I offer a prediction.

Expand full comment

So you’re saying voter enthusiasm and volunteer motivation aren’t factors in elections?

Expand full comment

No, I’m saying a good model captures and explains them.

Expand full comment

kys

Expand full comment

Lol

Expand full comment

In physics there is considerable doubt as to the existence of free will. Just saying.

Expand full comment

I'll write this as just a prediction for myself: I think the model premiers at something like Trump +0.5 as the polling average for Election Day, with a corresponding 55/45 winrate in favor of Trump. Something that's definitely a toss-up in absolute terms but is leaning slightly towards Trump.

I'm also predicting we'll see roughly an 0.5-2 point swing between when the model premiers and when the race "settles down" after the Democrat Convention and after the convention bounce fades. My guess is that it'll be a swing towards trump, but it could also be a swing against him. My BEST guess, barring any more black swan events, would be we see something like Trump +1 or Trump +1.5 solidify towards the start of September.

I think the modeling will show that Kamala's refreshed strength among the young and among african americans will mitigate a portion of Trump's electoral college advantage - I think it will show a much higher chance of winning GA specifically than Biden had, and that GA and PA's tipping point power will go way up.

Expand full comment

My prediction is the model will say 62/38 Trump, predicting around a dead heat in the Election Day polling average.

Expand full comment

Pretty close to my thinking. Polling is pretty much where it was pre-debate, when the model was at around 65% or so for Trump. I expect the model to be a smidge better for Dems now because a) there's more uncertainty, which probabilistically boosts the underdog and b) Harris does better than Biden in the Sun Belt, opening up some new paths for a Dem victory. My guess would be 60-40 in favor of Trump or thereabouts, but 55-45 certainly wouldn't shock me.

Expand full comment

I think Harris is down by about 1.5 in the average of the national polls. Assuming Trump picks up a point as the race settles down that's a 2 to 2.5 point Trump advantage.

Expand full comment

I meant picking up a point in the model's prediction of what election day would/could look like - and eventually the average and the model will converge as election day draws closer, sure, but I'm not sure at what rate that happens.

Expand full comment

This seems reasonable to me. I think the Veep choice will be fascinating- do the Dems solidify the Blue Wall, or try to expand the map in the Sun Belt and pray that the midwest vote holds up one last time? I'd probably argue PA is too important in any Dem map for them to pass on Shapiro if he wants it- he's probably the best natural pol in the bunch as well.

Expand full comment

GA and MI, not GA and PA. PA's tipping point power couldn't go UP any further and is liable to go down. Darn typos and inability to edit.

Expand full comment

I don't think you're accounting for Trump's electoral college advantage.

Expand full comment

Why would MI’s tipping point power increase. It’s the bluest of the swing states.

Expand full comment

The short version is, if you start with the 2020 map but flip Arizona red (which seems likely), Trump has to win 2 out of the 4 of GA, PA, MI, and WI. As losing GA becomes more likely, the tipping point power of MI increases, if that makes sense, because MI being winnable enables the PA/MI or WI/MI potential path to winning. Trump's polling has also been better in MI this cycle than in WI, so it's likely to be higher in tipping power if that continues.

Expand full comment

MI is michigan, not minnesota. MN is minnesota. Not sure how they decided that one, lol.

Expand full comment

Michigan, Minnesota, Mississippi, and Missouri were all fighting for that coveted MI. They probably should have given Michigan MC so that no one got that MI (though it's weird they let Missouri displace Montana from its own first two letters).

Expand full comment

I know that. Of the Trump/Biden states, Michigan is the bluest. MN hasn’t gone Republican since, I believe, before Regan.

Expand full comment

is there no boolean value to make "kamala_mod = true" ?

Expand full comment

It’s stata. It doesn’t have a Boolean data type.

Expand full comment

I assumed he wanted to leave space for non-Boolean values such as "only in California", "maybe next week", etc.

Expand full comment

Kamala Harris right now: ”This isn’t even my final form”.

If only we could have multiple VPs and sweep PA and AZ

Expand full comment
Comment deleted
Jul 29
Comment deleted
Expand full comment

Only if we Pokémon go to the polls.

Expand full comment
Comment deleted
Jul 30
Comment deleted
Expand full comment
Comment deleted
Jul 30
Comment deleted
Expand full comment

It is clear to me that you have missed the fact that I was saying it sardonically and not in earnest. My joke is referencing how bad the joke is. I was not making a metaphor for anything. Your condescension is misplaced.

Expand full comment

Minor point, but the 2024 cycle will weaken further the value of elections analysis precedent & especially re: incumbency. In 2020 it was a weird Covid year, so really the most recent normal incumbent year was 2012.

Which means the 2012 election would be the most comparable incumbent election cycle for an election that doesn't occur until 2028 or 2032. I just don't see how "fundamentals" and other more incumbency-based analysis can remain credible going forward.

Expand full comment

Momentum seems to be on the side of harris. She seems to be moving up fast. And she seems to be pretty good at trash talking Trump. That is always been Trump's strength and now he is running against someone that doesn't mind descending to his level

Expand full comment

Does the model account for Harris selecting Jen O’Malley Dilion to help run her campaign?

Expand full comment

If it shows up in polling. If not you need something like futures contracts to factor in qualitative information. The model helps us interpret polls and fundamentals when it comes to how it will effect outcomes.

Expand full comment

There is already an alien invasion of the West by the millions every year.

Expand full comment

If Harris is a distinct underdog maybe it isn’t too late for her to step aside.

Expand full comment

Won’t happen. She has a very devoted following. The KHive. They’re going all out.

Expand full comment

Weird

Expand full comment

wanting a candidate who can win and thinking a biracial female liberal from california might not be the optimal choice is weird? i think rallying behind harris before testing alternatives was stupid.

Expand full comment

Is it more probable that Skynet becomes self aware on the 11th as opposed to the 29th?

Expand full comment

Can’t imagine the mind splitting headache that results from trying to forecast the House elections now. Would think D’s are favored but by how much?

Expand full comment