How much of this can be attributed to the addition of the First Four in 2011? UMBC wouldn’t have been a 16-seed in 2010, and every low seed essentially dropped half a seed between 2000 and 2010. This, plus the fact that two 11-ish seeds have to win games to get to round one, seem like they would produce stronger upset candidates.
I dispute the 1-in-200 significance claim. I think you computed the probability of getting 77 or more heads in 334 flips of a coin with 104/600 probability of getting heads. 77 is the number of upsets since 2010, 334 is the number of opportunities since 2010, and 104/660 is the frequency of upsets before 2010. That number comes to 0.46%, or one in 218.
The right way to do it is to consider the probability of 77 heads if the flip probability is 181/934--in other words if the flip probability is constant at the overall sample value. That's 5.38%, 1 in 19, and below the standard 5% level often used to declare statistical significance.
Moreover, since you could have written a column about how upsets are declining if you got an equally unlikely result on the other side, you should double the p-value to 10.77%.
A superior approach is to use a Chi-Square test with one degree of freedom. That yields a lower significance, 3.40% for the one-tailed calculation, 6.80% for the two-tail.
Even these results should be further adjusted. You could have picked a year other than 2010, or counted more or fewer seeds. Moreover you know the assumptions of both the binomial and Chi-Square tests are not met--there is not a constant probability of upset in all games, and upsets are not distributed identically among all games.
Moving from theory to practice, it's easy to come up with hundreds of observations supported by this level of evidence, that have no basis beyond random chance. Using a 5% significant threshold only makes sense when you test one unique hypothesis determined independently of the data--ideally before you've seen the data--that is likely to be true.
If there is a real effect, my intuition would be to look for explanations in the seeding process rather than the nature of the game itself. It's become more systematic and evidence-based. I don't think that matters a lot for which teams are in the top six (the top 24 in the country), but I think the committee has been getting much better at selecting the 41st to 64th best teams.
As a fan of a 13 seed playing for the first time, I wish this was true but it's not. Lower seeded teams don't have players leaving for the NBA, but they are getting a lot of players leaving for NIL deals to be role players for power conference teams. The 2010s may have had more upsets but last year didn't and I expect that trend to sadly accelerate.
Do you think there's gonna be a regression on this with NIL? I picked all 4 #1 seeds for the first time this year because Florida Houston and Duke seem like legit superteams (Auburn seems weak, but they have an easy path). Hope I'm wrong and there's a ton of upsets, but it might be hard for mid-major schools to keep up.
I think part of it, maybe a big part of it, is the expansion to 68 teams in 2011. Teams that were 11-seeds are now 12s, teams that were 15s are now 16s. The top seeds aren't just worse, they're playing better teams in the first round.
Once you get to 12 vs 5, there are no upsets. Far too much parity. Still trying to understand how Auburn got the overall #1 seed after losing three of their last four games. Not exactly finishing strong
I wonder if there is any correlation between shooting volume increases and the amount of upsets?
For example, in the 2000’s I don’t think a Purdue team with Zach Edey loses to St. Peter’s because the teams they’d face at the #16 seed level wouldn’t have had the shooters. (Slow pace of play was rewarded more in the past).
And this is happening despite the gradually-decreasing shot clock over the past 40 years, which almost certainly favors top teams over potential low-seed upsets, who can’t shorten the number of possessions as much as they could in 1985 (Villanoooova) or even in 2010, when it was still 35 seconds.
How much of this can be attributed to the addition of the First Four in 2011? UMBC wouldn’t have been a 16-seed in 2010, and every low seed essentially dropped half a seed between 2000 and 2010. This, plus the fact that two 11-ish seeds have to win games to get to round one, seem like they would produce stronger upset candidates.
Came here to say this. The 16 seeds are doing better against the 1 seeds because the worst 16 seeds don’t even get to play the 1 seeds.
I also think changes to the conference tournament structure to reduce bid thieves has improved the quality of the lower seed lines as well.
I dispute the 1-in-200 significance claim. I think you computed the probability of getting 77 or more heads in 334 flips of a coin with 104/600 probability of getting heads. 77 is the number of upsets since 2010, 334 is the number of opportunities since 2010, and 104/660 is the frequency of upsets before 2010. That number comes to 0.46%, or one in 218.
The right way to do it is to consider the probability of 77 heads if the flip probability is 181/934--in other words if the flip probability is constant at the overall sample value. That's 5.38%, 1 in 19, and below the standard 5% level often used to declare statistical significance.
Moreover, since you could have written a column about how upsets are declining if you got an equally unlikely result on the other side, you should double the p-value to 10.77%.
A superior approach is to use a Chi-Square test with one degree of freedom. That yields a lower significance, 3.40% for the one-tailed calculation, 6.80% for the two-tail.
Even these results should be further adjusted. You could have picked a year other than 2010, or counted more or fewer seeds. Moreover you know the assumptions of both the binomial and Chi-Square tests are not met--there is not a constant probability of upset in all games, and upsets are not distributed identically among all games.
Moving from theory to practice, it's easy to come up with hundreds of observations supported by this level of evidence, that have no basis beyond random chance. Using a 5% significant threshold only makes sense when you test one unique hypothesis determined independently of the data--ideally before you've seen the data--that is likely to be true.
If there is a real effect, my intuition would be to look for explanations in the seeding process rather than the nature of the game itself. It's become more systematic and evidence-based. I don't think that matters a lot for which teams are in the top six (the top 24 in the country), but I think the committee has been getting much better at selecting the 41st to 64th best teams.
This is so nerdy and I want so much more of it.
Thank you.
As a fan of a 13 seed playing for the first time, I wish this was true but it's not. Lower seeded teams don't have players leaving for the NBA, but they are getting a lot of players leaving for NIL deals to be role players for power conference teams. The 2010s may have had more upsets but last year didn't and I expect that trend to sadly accelerate.
Do you think there's gonna be a regression on this with NIL? I picked all 4 #1 seeds for the first time this year because Florida Houston and Duke seem like legit superteams (Auburn seems weak, but they have an easy path). Hope I'm wrong and there's a ton of upsets, but it might be hard for mid-major schools to keep up.
I think part of it, maybe a big part of it, is the expansion to 68 teams in 2011. Teams that were 11-seeds are now 12s, teams that were 15s are now 16s. The top seeds aren't just worse, they're playing better teams in the first round.
Once you get to 12 vs 5, there are no upsets. Far too much parity. Still trying to understand how Auburn got the overall #1 seed after losing three of their last four games. Not exactly finishing strong
I wonder if there is any correlation between shooting volume increases and the amount of upsets?
For example, in the 2000’s I don’t think a Purdue team with Zach Edey loses to St. Peter’s because the teams they’d face at the #16 seed level wouldn’t have had the shooters. (Slow pace of play was rewarded more in the past).
I was also at that game in Knoxville in 1990. Luckily my dad let us stay for both games!
Thanks, Nate.
Sorry to be obtuse but where in the web version can I find a pointspread analysis? What is Nate referencing when he says his system likes a lot dogs?
And this is happening despite the gradually-decreasing shot clock over the past 40 years, which almost certainly favors top teams over potential low-seed upsets, who can’t shorten the number of possessions as much as they could in 1985 (Villanoooova) or even in 2010, when it was still 35 seconds.