Stata doesn’t have Boolean data types, so you’d use a scalar with allowed values of 0|1.
My bet is that this is a stata front end for easily adjusting model params, doing exploratory data analysis, and some straightforward stats models. Then, you can feed parameterized data and stats into the actual ML pipeline kind of like a config file.
you'd be shocked and thrilled how easy it is to work with strings in other languages.
for example, it is easy to read a UTF-8 text file into a string, process that string using regular expressions, and then write it to a text file. this usually requires at most one or two lines of code.
"The model does not yet take into account the impact of the regional war spawned by the firebombing of Beirut shortly after Turkey's inexplicable amphibious assault on Israel, nor the annexation of Taiwan or the ground invasion by Kim into South Korea, but we think these numbers are pretty good all things considered" - Tuesday, probably.
Lets turn those coconuts into piña coladas because we're gonna need it. . .
Does the model differentiate between polls testing Harris v Trump prior to Joe Biden leaving the race and those conducted after 7/21? If not, would the model's results look significantly different if those earlier polls were excluded? The logic being that voters may respond differently when imagining a theoretical candidate, as opposed to when that person is a "real" candidate.
I give Harris a 25% chance, here’s why. Harris is down by 1.7 in the national polling average, we don’t really have great swing state data so no need to do a spreadsheet. Harris needs to win by 1.8 to be even money in the electoral college. Standard error (combination of polling error and potential for changes in support) right now is about 5 points, so she needs to over-perform the mean by about 0.7 standard deviations to win the electoral college. Use a t distribution with 13 degrees of freedom (sample of 14 elections with modern polling) and that gives her a 25% chance of winning the election.
Nate’s models are generally less confident than my back of the envelope models, so revert the that figure to 50% by 1/3. Nate will predict Harris has a 33% chance.
Note that her polling is about as bad as Biden’s was before the debate. Nate’s model should land on a number similar to Biden’s pre debate number.
I like your methodology, but I believe that your inputs are flawed--specifically Harris' national polling average. So I disagree with your conclusion. We shall see tomorrow though.
Sure, but nearly 50% of the polls that factor into that average were completely or partially performed before Harris was an actual candidate. Also some rough back of the napkin math will tell you that RCP does not take into consideration Rasmussen’s polling bias, so that +7 is factoring in more than it ought to be. The average should favor Trump, but I believe that it ought to be closer to even than +1.7 Trump. Crap in, crap out.
If you look only at the polls included by RCP that starting sample 7/22 or after, Trump leads by 1.6. If you throw out the best poll for each candidate, Trump leads by 0.8, that gives Harris a 31% chance, which is about where I think Nate will land.
However, I’m not really comfortable adjusting the polling average in favor of Harris when Trump outperformed the polling average two elections in a row.
I’m not suggesting you throw any of the polls out. I’m suggesting that approach polls with caution, like Nate will do. This includes considering that nearly 50% of the polls in the RCP were from before Harris was a candidate, and it includes considering the bias of each poll. The Rasmussen +7 poll was just an example, but the rest of the polls also have some bias one way or the other. It is my strong belief that tomorrow Nate’s average will be closer to even than +1.7 Trump. I am prepared to accept that I could also be completely wrong, but I’m glad I won’t have to wait too long to find out. I would take the other side of your bet every day of the week and twice on Sundays that Harris will have more than a 31% chance upon model open due to there being more uncertainty now and knowing how much the polling average had to be in Trump’s favor in Trump v Biden for Biden’s chances to fall to only 31%.
Biden was around 32% before the debate when he was down about a point in the polling average. Why would Harris be doing better? She has four weeks less to close the gap and her polling isn’t any better
I thought some state polls were showing her in a dead heat in the Midwest and national polls showing her within a point or so on average as of now. Maybe seeing them all aggregated will make it more clear but I thought she was in a little better spot today than Biden pre-debate.
Looking just at post-dropout polls, Harris is down 0.5 in MI and WI and down 1 in PA. She is down more than a standard deviation in NV and AZ. Her bright spot is GA, where she is down 1.5, which is 2-3 points better than Biden. Winning Georgia would let her lose MI or WI but not PA. That gives her basically a 46% chance in MI and WI, a 42% chance in PA and a 37% chance in Georgia. If those states were perfectly correlated, her odds of winning the electoral college would be 42%. If they are completely independent, her odds would be 17%. Given that three of the states are midwestern with similar demographics and losing PA basically knocks Harris out, I would be comfortable weighing the perfectly correlated figure 0.6 and the independent figure 0.4, that gives her odds of 32%, right around where Nate is likely to land.
This also shows how Nate’s fancy correlation matrices are a useful refinement. If you split the difference between complete independence and complete correlation, her odds are 24%. If you go 2/3 of the way between complete independence and complete correlation they are 33%. Weighing complete correlation 0.6 won’t be off by more than a few points, Georgia is different enough from the midwestern states that the correct coefficient is pretty clearly in that range.
Yes, there are weird patterns where she wins AZ and doesn’t need WI, but there are also weird patterns where she loses VA or MN. The weirdness will roughly cancel out and really isn’t worth doing until there are more polls and you can do a formal model.
There was never a day where Biden was at 32%. He was at 31% both on June 30 and July 17, and he was down 2.3% and 2.0%, respectively. Your premise that her polling isn’t any better is flawed, and the data will bear that out. You have failed to consider that while Trump v Biden was quite stable, Trump v Harris is relatively unstable, and the model will likely revert closer to even odds at whatever her polling average ends up being than where Biden was. Emphasis on “closer to”.
ps— one assumption I made is that the number of times Harris loses the EC while winning the popular vote by more than 1.8 points equals the number of times she wins the EC while winning less than 1.8%. That rule of thumb begins to break down if she’s ahead or behind by more than a standard deviation, but her t score is between 0.5 and 0.8 so that so my rule of thumb will work pretty well and there’s no need to do a spreadsheet until we have better swing state numbers
I'll write this as just a prediction for myself: I think the model premiers at something like Trump +0.5 as the polling average for Election Day, with a corresponding 55/45 winrate in favor of Trump. Something that's definitely a toss-up in absolute terms but is leaning slightly towards Trump.
I'm also predicting we'll see roughly an 0.5-2 point swing between when the model premiers and when the race "settles down" after the Democrat Convention and after the convention bounce fades. My guess is that it'll be a swing towards trump, but it could also be a swing against him. My BEST guess, barring any more black swan events, would be we see something like Trump +1 or Trump +1.5 solidify towards the start of September.
I think the modeling will show that Kamala's refreshed strength among the young and among african americans will mitigate a portion of Trump's electoral college advantage - I think it will show a much higher chance of winning GA specifically than Biden had, and that GA and PA's tipping point power will go way up.
Pretty close to my thinking. Polling is pretty much where it was pre-debate, when the model was at around 65% or so for Trump. I expect the model to be a smidge better for Dems now because a) there's more uncertainty, which probabilistically boosts the underdog and b) Harris does better than Biden in the Sun Belt, opening up some new paths for a Dem victory. My guess would be 60-40 in favor of Trump or thereabouts, but 55-45 certainly wouldn't shock me.
I think Harris is down by about 1.5 in the average of the national polls. Assuming Trump picks up a point as the race settles down that's a 2 to 2.5 point Trump advantage.
I meant picking up a point in the model's prediction of what election day would/could look like - and eventually the average and the model will converge as election day draws closer, sure, but I'm not sure at what rate that happens.
This seems reasonable to me. I think the Veep choice will be fascinating- do the Dems solidify the Blue Wall, or try to expand the map in the Sun Belt and pray that the midwest vote holds up one last time? I'd probably argue PA is too important in any Dem map for them to pass on Shapiro if he wants it- he's probably the best natural pol in the bunch as well.
The short version is, if you start with the 2020 map but flip Arizona red (which seems likely), Trump has to win 2 out of the 4 of GA, PA, MI, and WI. As losing GA becomes more likely, the tipping point power of MI increases, if that makes sense, because MI being winnable enables the PA/MI or WI/MI potential path to winning. Trump's polling has also been better in MI this cycle than in WI, so it's likely to be higher in tipping power if that continues.
Michigan, Minnesota, Mississippi, and Missouri were all fighting for that coveted MI. They probably should have given Michigan MC so that no one got that MI (though it's weird they let Missouri displace Montana from its own first two letters).
It is clear to me that you have missed the fact that I was saying it sardonically and not in earnest. My joke is referencing how bad the joke is. I was not making a metaphor for anything. Your condescension is misplaced.
Minor point, but the 2024 cycle will weaken further the value of elections analysis precedent & especially re: incumbency. In 2020 it was a weird Covid year, so really the most recent normal incumbent year was 2012.
Which means the 2012 election would be the most comparable incumbent election cycle for an election that doesn't occur until 2028 or 2032. I just don't see how "fundamentals" and other more incumbency-based analysis can remain credible going forward.
Momentum seems to be on the side of harris. She seems to be moving up fast. And she seems to be pretty good at trash talking Trump. That is always been Trump's strength and now he is running against someone that doesn't mind descending to his level
If it shows up in polling. If not you need something like futures contracts to factor in qualitative information. The model helps us interpret polls and fundamentals when it comes to how it will effect outcomes.
wanting a candidate who can win and thinking a biracial female liberal from california might not be the optimal choice is weird? i think rallying behind harris before testing alternatives was stupid.
As long as you aren't having to to type a sentence like “the model doesn’t yet fully account for the Chinese attack on Taiwan” I'll be good.
The temptation for our rivals to pull something big on us at this point must be huge.
I'm triggered. howmanysims should be how_many_sims to be consistent with the naming convention of the other variables. STOP EVERYTHING AND FIX IT NOW
I'm 50/50 on whether it's a pseudocode screenshot for giggles. It's hard to imagine that kamala_mode isn't 0/1-valued.
Stata doesn’t have Boolean data types, so you’d use a scalar with allowed values of 0|1.
My bet is that this is a stata front end for easily adjusting model params, doing exploratory data analysis, and some straightforward stats models. Then, you can feed parameterized data and stats into the actual ML pipeline kind of like a config file.
Alas, I am but a simple Fortran monkey bashing zeros and ones my together.
As a chemist who saw many a friend drown in Fortran-land, respect.
you'd be shocked and thrilled how easy it is to work with strings in other languages.
for example, it is easy to read a UTF-8 text file into a string, process that string using regular expressions, and then write it to a text file. this usually requires at most one or two lines of code.
“Process that string using regex”
1-2 lines, 2-4 hours haha (at least for me)
Fortran 77 was a huge improvement over Fortran IV. Pretty good with strings.
I am but a young blossom and my standard is f90. Lords be praised.
Wrote Fortran for years (retired now).
I highly doubt it. This isn’t a good application for ML. It makes more sense to have more control over your model.
could be, but those would be some pretty elaborate comments for pseudocode
Have to admit I dwelled on that for several seconds when I scanned the code snippet.
Personally I'd go with sim_count or total_sims if I were doing underscores.
"The model does not yet take into account the impact of the regional war spawned by the firebombing of Beirut shortly after Turkey's inexplicable amphibious assault on Israel, nor the annexation of Taiwan or the ground invasion by Kim into South Korea, but we think these numbers are pretty good all things considered" - Tuesday, probably.
Lets turn those coconuts into piña coladas because we're gonna need it. . .
🛸👽Maybe they’ll finally decide to take Trump back.
Does the model differentiate between polls testing Harris v Trump prior to Joe Biden leaving the race and those conducted after 7/21? If not, would the model's results look significantly different if those earlier polls were excluded? The logic being that voters may respond differently when imagining a theoretical candidate, as opposed to when that person is a "real" candidate.
I give Harris a 25% chance, here’s why. Harris is down by 1.7 in the national polling average, we don’t really have great swing state data so no need to do a spreadsheet. Harris needs to win by 1.8 to be even money in the electoral college. Standard error (combination of polling error and potential for changes in support) right now is about 5 points, so she needs to over-perform the mean by about 0.7 standard deviations to win the electoral college. Use a t distribution with 13 degrees of freedom (sample of 14 elections with modern polling) and that gives her a 25% chance of winning the election.
Nate’s models are generally less confident than my back of the envelope models, so revert the that figure to 50% by 1/3. Nate will predict Harris has a 33% chance.
Note that her polling is about as bad as Biden’s was before the debate. Nate’s model should land on a number similar to Biden’s pre debate number.
I like your methodology, but I believe that your inputs are flawed--specifically Harris' national polling average. So I disagree with your conclusion. We shall see tomorrow though.
That 1.7 is RCP's average as of today I believe.
Sure, but nearly 50% of the polls that factor into that average were completely or partially performed before Harris was an actual candidate. Also some rough back of the napkin math will tell you that RCP does not take into consideration Rasmussen’s polling bias, so that +7 is factoring in more than it ought to be. The average should favor Trump, but I believe that it ought to be closer to even than +1.7 Trump. Crap in, crap out.
If you look only at the polls included by RCP that starting sample 7/22 or after, Trump leads by 1.6. If you throw out the best poll for each candidate, Trump leads by 0.8, that gives Harris a 31% chance, which is about where I think Nate will land.
However, I’m not really comfortable adjusting the polling average in favor of Harris when Trump outperformed the polling average two elections in a row.
I’m not suggesting you throw any of the polls out. I’m suggesting that approach polls with caution, like Nate will do. This includes considering that nearly 50% of the polls in the RCP were from before Harris was a candidate, and it includes considering the bias of each poll. The Rasmussen +7 poll was just an example, but the rest of the polls also have some bias one way or the other. It is my strong belief that tomorrow Nate’s average will be closer to even than +1.7 Trump. I am prepared to accept that I could also be completely wrong, but I’m glad I won’t have to wait too long to find out. I would take the other side of your bet every day of the week and twice on Sundays that Harris will have more than a 31% chance upon model open due to there being more uncertainty now and knowing how much the polling average had to be in Trump’s favor in Trump v Biden for Biden’s chances to fall to only 31%.
I would like to memorialize that the initial model shows probability of 61/38.
I just took it from RCP
I think there's a 90% chance that Nate gives Harris a better than 33% chance. I predict 62/38 for Trump.
You were spot on. Nice job!
Biden was around 32% before the debate when he was down about a point in the polling average. Why would Harris be doing better? She has four weeks less to close the gap and her polling isn’t any better
I thought some state polls were showing her in a dead heat in the Midwest and national polls showing her within a point or so on average as of now. Maybe seeing them all aggregated will make it more clear but I thought she was in a little better spot today than Biden pre-debate.
Looking just at post-dropout polls, Harris is down 0.5 in MI and WI and down 1 in PA. She is down more than a standard deviation in NV and AZ. Her bright spot is GA, where she is down 1.5, which is 2-3 points better than Biden. Winning Georgia would let her lose MI or WI but not PA. That gives her basically a 46% chance in MI and WI, a 42% chance in PA and a 37% chance in Georgia. If those states were perfectly correlated, her odds of winning the electoral college would be 42%. If they are completely independent, her odds would be 17%. Given that three of the states are midwestern with similar demographics and losing PA basically knocks Harris out, I would be comfortable weighing the perfectly correlated figure 0.6 and the independent figure 0.4, that gives her odds of 32%, right around where Nate is likely to land.
This also shows how Nate’s fancy correlation matrices are a useful refinement. If you split the difference between complete independence and complete correlation, her odds are 24%. If you go 2/3 of the way between complete independence and complete correlation they are 33%. Weighing complete correlation 0.6 won’t be off by more than a few points, Georgia is different enough from the midwestern states that the correct coefficient is pretty clearly in that range.
Yes, there are weird patterns where she wins AZ and doesn’t need WI, but there are also weird patterns where she loses VA or MN. The weirdness will roughly cancel out and really isn’t worth doing until there are more polls and you can do a formal model.
There was never a day where Biden was at 32%. He was at 31% both on June 30 and July 17, and he was down 2.3% and 2.0%, respectively. Your premise that her polling isn’t any better is flawed, and the data will bear that out. You have failed to consider that while Trump v Biden was quite stable, Trump v Harris is relatively unstable, and the model will likely revert closer to even odds at whatever her polling average ends up being than where Biden was. Emphasis on “closer to”.
ps— after looking at the swing state polls, I would be more comfortable giving Harris a 29% chance.
ps— one assumption I made is that the number of times Harris loses the EC while winning the popular vote by more than 1.8 points equals the number of times she wins the EC while winning less than 1.8%. That rule of thumb begins to break down if she’s ahead or behind by more than a standard deviation, but her t score is between 0.5 and 0.8 so that so my rule of thumb will work pretty well and there’s no need to do a spreadsheet until we have better swing state numbers
Nothing is pre-determined- the outcome of the election is down to the will annd motivation of the individual voters and volunteers.
You offer a vacuous tautology. I offer a prediction.
So you’re saying voter enthusiasm and volunteer motivation aren’t factors in elections?
No, I’m saying a good model captures and explains them.
kys
Lol
In physics there is considerable doubt as to the existence of free will. Just saying.
I'll write this as just a prediction for myself: I think the model premiers at something like Trump +0.5 as the polling average for Election Day, with a corresponding 55/45 winrate in favor of Trump. Something that's definitely a toss-up in absolute terms but is leaning slightly towards Trump.
I'm also predicting we'll see roughly an 0.5-2 point swing between when the model premiers and when the race "settles down" after the Democrat Convention and after the convention bounce fades. My guess is that it'll be a swing towards trump, but it could also be a swing against him. My BEST guess, barring any more black swan events, would be we see something like Trump +1 or Trump +1.5 solidify towards the start of September.
I think the modeling will show that Kamala's refreshed strength among the young and among african americans will mitigate a portion of Trump's electoral college advantage - I think it will show a much higher chance of winning GA specifically than Biden had, and that GA and PA's tipping point power will go way up.
My prediction is the model will say 62/38 Trump, predicting around a dead heat in the Election Day polling average.
Pretty close to my thinking. Polling is pretty much where it was pre-debate, when the model was at around 65% or so for Trump. I expect the model to be a smidge better for Dems now because a) there's more uncertainty, which probabilistically boosts the underdog and b) Harris does better than Biden in the Sun Belt, opening up some new paths for a Dem victory. My guess would be 60-40 in favor of Trump or thereabouts, but 55-45 certainly wouldn't shock me.
I think Harris is down by about 1.5 in the average of the national polls. Assuming Trump picks up a point as the race settles down that's a 2 to 2.5 point Trump advantage.
I meant picking up a point in the model's prediction of what election day would/could look like - and eventually the average and the model will converge as election day draws closer, sure, but I'm not sure at what rate that happens.
This seems reasonable to me. I think the Veep choice will be fascinating- do the Dems solidify the Blue Wall, or try to expand the map in the Sun Belt and pray that the midwest vote holds up one last time? I'd probably argue PA is too important in any Dem map for them to pass on Shapiro if he wants it- he's probably the best natural pol in the bunch as well.
GA and MI, not GA and PA. PA's tipping point power couldn't go UP any further and is liable to go down. Darn typos and inability to edit.
I don't think you're accounting for Trump's electoral college advantage.
Why would MI’s tipping point power increase. It’s the bluest of the swing states.
The short version is, if you start with the 2020 map but flip Arizona red (which seems likely), Trump has to win 2 out of the 4 of GA, PA, MI, and WI. As losing GA becomes more likely, the tipping point power of MI increases, if that makes sense, because MI being winnable enables the PA/MI or WI/MI potential path to winning. Trump's polling has also been better in MI this cycle than in WI, so it's likely to be higher in tipping power if that continues.
MI is michigan, not minnesota. MN is minnesota. Not sure how they decided that one, lol.
Michigan, Minnesota, Mississippi, and Missouri were all fighting for that coveted MI. They probably should have given Michigan MC so that no one got that MI (though it's weird they let Missouri displace Montana from its own first two letters).
I know that. Of the Trump/Biden states, Michigan is the bluest. MN hasn’t gone Republican since, I believe, before Regan.
is there no boolean value to make "kamala_mod = true" ?
It’s stata. It doesn’t have a Boolean data type.
I assumed he wanted to leave space for non-Boolean values such as "only in California", "maybe next week", etc.
Kamala Harris right now: ”This isn’t even my final form”.
If only we could have multiple VPs and sweep PA and AZ
Only if we Pokémon go to the polls.
https://amp.knowyourmeme.com/memes/pokemon-go-to-the-polls
It is clear to me that you have missed the fact that I was saying it sardonically and not in earnest. My joke is referencing how bad the joke is. I was not making a metaphor for anything. Your condescension is misplaced.
Minor point, but the 2024 cycle will weaken further the value of elections analysis precedent & especially re: incumbency. In 2020 it was a weird Covid year, so really the most recent normal incumbent year was 2012.
Which means the 2012 election would be the most comparable incumbent election cycle for an election that doesn't occur until 2028 or 2032. I just don't see how "fundamentals" and other more incumbency-based analysis can remain credible going forward.
Momentum seems to be on the side of harris. She seems to be moving up fast. And she seems to be pretty good at trash talking Trump. That is always been Trump's strength and now he is running against someone that doesn't mind descending to his level
Does the model account for Harris selecting Jen O’Malley Dilion to help run her campaign?
If it shows up in polling. If not you need something like futures contracts to factor in qualitative information. The model helps us interpret polls and fundamentals when it comes to how it will effect outcomes.
There is already an alien invasion of the West by the millions every year.
If Harris is a distinct underdog maybe it isn’t too late for her to step aside.
Won’t happen. She has a very devoted following. The KHive. They’re going all out.
Weird
wanting a candidate who can win and thinking a biracial female liberal from california might not be the optimal choice is weird? i think rallying behind harris before testing alternatives was stupid.
Is it more probable that Skynet becomes self aware on the 11th as opposed to the 29th?
Can’t imagine the mind splitting headache that results from trying to forecast the House elections now. Would think D’s are favored but by how much?