Mar 28

It's a machine learning model ... but with a lot of human curation.

5 Comments

With the Bayesian padding I think you’re missing something if you pull to group mean rather than expectation given the sample size you have. This is particularly true for three point shooting, a guy who has zero attempts across a full season of minutes certainly wouldn’t be expected to be average

Reply (1)

Vicente Iglesias

Mar 29

Also honestly a bit confused by choosing the binary player vs player fitting. Obviously margin of difference matters a ton

James Corwin

Apr 11

When's the sports forecasting coming back? I'm sadly going to have to end my subscription until they exist again

Apr 9

Maybe it’s a bit navel gazing and unnecessary… but given the novelty of the results here, it would be interesting to get post-hoc analyses of prior draft classes — where did things go sideways / surface surprising results and what drove those schisms

Nik Oza

Mar 29Edited

Good stuff! Few questions/comments:

What are you using to define your draft prospect pool for each class? E.g. Why those 110 players for 2026 and those 80 players for 2025?

Do you use exact birthdate / age in the model? Or just class / years in college as a proxy for age? Burries for example is the age of an older sophomore despite being a freshman and the exact age methodology will have an outsized impact on any model projection for players like him.

By "playtype frequencies" do you mean Synergy play types or something else? Does the model have access to Synergy data?

How are you defining BPM share when a team's minute-weighted average BPM is negative?

Do you have a version that does not include scouting consensus features? That would interesting to look at.

“Gradient boosting regression to a target like WAR has a few limitations, one of them being that the projections don’t exceed the training set ceiling.” - this is true for random forest but not true for gradient boosting, right? If the trees are fit sequentially to residuals the sum of many trees is not constrained.

I’m still unclear on why you took the pairwise comparison approach instead of predicting a target variable, especially if your pairwise comparison was comparing 7-year EPM WAR among each pair of prospects anyway. 7-year EPM WAR projections would be much more interpretable than the current formulation for PRISM scores.

How our PRISM NBA draft model works