How our ELWAY forecasts work
Way more detail than you probably wanted on Silver Bulletin's new NFL forecasting system.
ELWAY is Silver Bulletin’s exclusive NFL team rating and forecasting system. It’s based on our analysis of every game in NFL history (since 1920), with a greater emphasis on recent seasons as more data has become available over time. QBERT is our quarterback rating system, which plays a prominent role in ELWAY.
Despite its official backronym (Elo with Lineup Weights and Adjusted Yardage), ELWAY shares only some elements with Elo rating systems, such as the legacy FiveThirtyEight NFL model. Each team gets a rolling offensive and defensive rating, which corresponds approximately to the number of points it would be expected to score and allow against an average opponent. These can be combined with other factors to project win probabilities, margins of victory, and the total number of points scored in future NFL games. ELWAY ratings are updated at the end of each week based on a team’s performance, adjusted for the strength of the opponent and other factors.
ELWAY has five distinguishing features:
In measuring team performance, ELWAY considers a lot more than just the final score. We evaluate which factors are more predictive of future performance. Offensive and defensive efficiency are generally more predictive of future results than game scores alone, particularly when adjusted for the score and after backing out factors like turnovers that can be more random.
ELWAY ratings are recalibrated at the start of each season based partly on roster turnover. Essentially, we calculate projected wins over replacement (WAR) for every player on the start-of-season roster based on recent performance, age, and, for younger players, draft position.
QBERT ratings are calculated in parallel with ELWAY and can significantly impact our overall projections. In line with Vegas point spreads, the effect of a QB change can be a touchdown or even more in extreme cases. We maintain a quarterback depth chart for every team, and the model accounts for the possibility of future injuries and benchings. We also directly account for current QB injuries.
Apart from the quarterback adjustment, ELWAY adjusts for several factors that reliably affect performance: non-QB injuries, travel and rest, various forms of home-field advantage, and even weather and coaching changes.
ELWAY simulates the remainder of the NFL season thousands of times and plays out tiebreakers to calculate projected season-ending W-L totals and playoff and championship odds. Although simulation is obviously not a novel technique, ELWAY takes it a step further to mirror real-world behavior. For instance, in our simulations, a team is more likely to bench its quarterback, replace its coach, and keep more players on the injured list if it has a losing record. The simulations run “hot”, meaning that within each simulation, a team’s performance affects its projections further down that particular branch of reality.
Each of these components is described in more detail below.
1. ELWAY team ratings
Most NFL power ratings are based on the margin of victory or points scored and allowed in previous games. However, some statistics are more predictive of future performance than plain ol’ points.
Thus, ELWAY assigns each team an implicit offensive and defensive rating for each game based on a variety of box score statistics and factors related to the score. Essentially, ELWAY prefers teams that consistently move the ball. In general, these factors are most predictive of future performance on both offense and defense and receive a lot of emphasis in ELWAY:
Yards per play
Completions
First downs
Sacks
Conversely, these factors can substantially affect the outcome of any given game, but can be pretty random, and therefore have less of an impact in projecting future performance:
Turnovers
Special teams
Third-down conversions
Penalties
Points scored and allowed are somewhere in the middle of the spectrum. In a game like Super Bowl LIX, for instance, the final score (40-22) did not fully reflect the Eagles’ dominance. Still, scoring and preventing points is the object of the game, and there is some persistence in these factors. There is also a small bonus embedded in ELWAY for simply winning or losing games. You might think of this as a “clutch factor”. We believe that clutch play is generally overrated — but when aggregated over tens of thousands of past NFL games, there is a modest amount of signal.
One tricky factor in football is that offense bleeds over into defense and vice versa because a better offense creates improved field position and only one team possesses the ball at a time. The notion that “the best defense is a good offense” is true; if you have the ball, the opposition is less likely to score on you than against the 1985 Bears. ELWAY accounts for this.
ELWAY also calculates what would be called a “pace factor” in the NBA. Passing plays take up considerably less time of possession than running plays. In addition, some coaches prefer to burn more time on the play clock. (Playing at a faster pace is generally indicative of a higher-quality offense.) Basically, ELWAY separately projects the number of points scored and allowed per play, and the number of offensive and defensive plays it expects each team to have per game, and combines these to create its overall ratings.
ELWAY also considers factors related to the score and how it affects strategy. Teams with substantial leads late in the game have less efficient offenses by design1 — they’re trying to run out the clock — while trailing teams improve in yards per play. ELWAY accounts for this based on a log of point scoring: teams with substantial leads late in the game essentially receive an upward adjustment to their offensive statistics.
ELWAY also calculates a rolling expectation of league-average point scoring, with aggressive adjustments early in the season. Leaguewide point scoring and offensive efficiency can be affected substantially by rule changes. A team’s offensive and defensive ELWAY rating should always be considered in context relative to the league environment. An ELWAY offensive rating of 25.0 is pretty good in 2025, for instance, but it would be superlative in the 1970s.
In traditional Elo ratings, changes in team ratings always net out to zero for a given game. For example, if the Chargers gain 15 Elo rating points in defeating the Broncos, the Broncos lose 15 points. This is not true for ELWAY. Instead, both teams may wind up with net-positive or net-negative ratings for the game. Generally, efficient offense is an indicator of higher team quality, so a 38-35 game with no turnovers might be favorable for both teams in predicting future performance. Conversely, following a terrible game like this one (which Nate had the misfortune of attending), both teams’ ratings may decline relative to the rest of the league.
On the ELWAY landing page, we list three sets of ratings that you can scroll between:
Ratings for the forthcoming week, accounting for the projected starting QB for each team and current injuries. Although these are not the only adjustments that ELWAY considers, QBs and injuries tend to be the most important ones.
Ratings with injury adjustments backed out, and each team with its preferred QB1. Essentially, this reflects a team’s projected performance when all players are fully healthy. Teams may revert toward this injury-free rating later in the season — although we list the ratings with a team’s preferred QB1 even if he’s out for the year.
Finally, a “raw” version of ELWAY based purely on team statistics, not accounting for quarterbacks (or injuries) at all. This rating is most comparable to other team rating systems. It’s also the starting point for ELWAY’s game-by-game projections: technically speaking, the identity and projected QBERT rating of the starting QB is an “adjustment” to these baseline ratings.
We also list an Elo rating for each team for comparability with past FiveThirtyEight NFL projections. However, it is not a separate rating system; rather, we derive it based on a team’s offensive and defensive ELWAY rating. Each point of projected net margin is worth approximately 21.5 Elo ratings points. But this varies somewhat. Offensive-minded teams have slightly more variance in their game-to-game results than defensive-minded ones, and our Elo formula (and our ELWAY simulations) reflect this.
2. Preseason projections and roster ratings
At the beginning of each season, a team’s ELWAY ratings are based partly on its ratings at the end of the previous season and partly on roster ratings2, which are calculated by essentially adding up WAR projections for each individual player on the roster. This process implicitly builds in some mean reversion, so the spread in ELWAY ratings tends to be wider at the end of each season than at the start.
Roster ratings are based on a series of calculations involving QBERT and Football-Reference.com’s Approximate Value (AV). Essentially, we calculate a projected AVAR — Approximate Value Above Replacement — for every player. However, this requires several transformations from the Football Reference version of AV.
For QBs, we use QBERT as a substitute for AVAR.
In the Football Reference version of AV, nearly all players have a positive AV, such that even the worst teams in NFL history have substantially positive aggregate AVs. So we incorporate a replacement level calculation, which is equivalent to the 25th percentile of players3.
AV dramatically understates the value of quarterbacks4, who are assigned approximately 6 percent of AV — when, in fact, they are responsible for roughly 30 percent of total marginal value generated by players, we estimate.
For positions other than QB, value assignments are mainly based on the recent proportion of league cap hit above the minimum salary, with a slight upward adjustment for running backs, given that they’ve historically been valued in the league more than they are currently. Our overall estimates of the share of AVAR generated by position since 19605 are as follows. Values do not add up exactly because of rounding:
The overall value assignments are deliberately skewed toward offense (61-62 percent of overall value) versus defense (36-37 percent) and special teams (2 percent). In general, offense is more predictive of future performance than defense.
Our roster ratings cap the amount of value that can be derived from each position. For instance, if a team that already has a good quarterback also acquires a backup who is projected to have a high QBERT rating, our formula recognizes that he probably won’t get much playing time.
AVAR projections are based on AVAR generated over the past three seasons, age, and (for younger players) draft position. ELWAY also projects each player’s expected number of snaps taken based on these statistics, plus his snap count from the previous season.
Different aging curves are incorporated for each position. We estimate that running backs peak the earliest, around ages 24-25, while quarterbacks peak the latest, at 28-29.
For rookies, we project AVAR based on their position and draft slot. The coefficient for draft slot is nonlinear: there is a considerably bigger difference between the 1st and the 11th pick than between the 250th and the 260th pick. We also account for long-term trends in the performance of rookies. In general, rookie QBs and other offensive rookies have contributed more value in the past couple of decades than they once did.
3. QBERT and the quarterback “adjustment”
ELWAY projections are substantially adjusted for the projected QBERT ratings for the starting quarterbacks for each game. For more details on QBERT, see here.
However, this is trickier than it might seem because QB performance is also implicitly baked into the team ratings. (As mentioned, we estimate that about 30 percent of overall NFL player value is provided by QBs.) Essentially, we calculate a rolling average of the QBERT rating that a team has achieved in recent games and compare it against the projected performance of the expected starting quarterback for the next game.
These effects are most pronounced when a QB is injured or changes teams. But sometimes they can be counterintuitive. Whoever took over for Tom Brady on the Patriots once he went to Tampa was very likely to be worse than Brady, for instance. Thus, the post-Brady Patriots would have a negative QB adjustment for some period of time, even if the replacement was also projected to be above average.
Even if there’s been no recent QB change, the adjustment can have an impact if the QB is on a rising or declining trajectory, especially because of age. Quarterbacks improve substantially in their first ~20 starts as they gain more experience. A team with a rookie QB will generally have a downward QB adjustment in Week 1, but it could be positive by the end of the season.
The QB ratings punish quarterbacks for missing starts; both benchings and injuries are negative indicators for future performance.
ELWAY also evaluates each team’s QB carousel for the rest of the season based on manually updated depth charts. The program simulates benchings, and quarterbacks are much more likely to be benched following losses, poor performances and/or a poor ongoing QBERT rating. Conversely, young quarterbacks who are high draft picks are considerably less likely to be benched, holding other factors constant, as are quarterbacks who have already accumulated a number of wins during the season.
We also incorporate a subjective rating on each team’s depth chart that reflects the tenuousness of a given starter’s hold on the job. QBs designated as “rock solid” are very unlikely to lose their jobs, other than due to injury, while teams have an itchier trigger finger for QBs designated as “tenuous”. In rare instances, the starter might not be clear for the upcoming week; the model handles these cases probabilistically.
When a QB is benched, he’s generally dropped one slot in the depth chart to QB2, but the QB is rotated to the bottom of the depth chart in a minority of simulations.
Each starting QB also gets a long-term injury grade, which corresponds to a 1.25 percent (“Iron man”) to 4.5 percent chance (“High risk”) per game of missing at least one future start due to an injury. There is also a small chance of reserve QBs being hurt in practice. As of 2025, these grades are subjective, but we may look to automate them in future seasons.
We also account for current QB injuries based on an assessment of publicly available information on injuries. Players designated as “questionable” on NFL injury reports play in the subsequent game ~70 percent of the time, while players designated as “doubtful” play only ~7 percent of the time. However, we sometimes incorporate additional categories beyond this when the situation warrants it.
For injured quarterbacks, we also project a return date. This is usually probabilistic: for example, a QB projected to return at some point between Week 7 and Week 10 will randomly be assigned one of these weeks in each simulation.
QB adjustments are larger for teams where the QB plays a larger role in the offense.6
4: Non-QB injuries and other adjustments
Beyond incorporating QBERT ratings, ELWAY makes several other adjustments to its forecasts of each game.
Having short weeks, long weeks (due to the presence of Thursday Night or Monday Night games) or bye weeks can have a significant impact. A team coming off a bye (when the other team isn’t) gains ~1 point in expected victory margin. An extra day of rest (e.g., if the opposing team is coming off a Monday Night game) is worth about 0.3 points.
Teams playing at late hours relative to their home time zone can suffer a significant penalty. This is most pronounced when an East Coast team plays a night game on the West Coast. A 1 p.m. ET start (i.e., 10 a.m. local time) produces a slight disadvantage for a West Coast team, but the effect is smaller. Thus, West Coast teams have a slight intrinsic advantage built into the schedule.
Empirically, travel distance also matters. This reflects a combination of shorter flights, as well as how teams in close proximity to their opponents can bring more of their fans with them. There are special procedures in place when a team plays a “home” game outside of its metro area due to, e.g., natural disasters, or when both teams are in the same metro area (Giants vs. Jets).
Home-field advantage has been in gradual decline over time. Having spent a considerable amount of time on this question, however, we believe the extent of this is somewhat overstated, as the NFL season is inherently a small sample size and fluctuations in HFA can be noisy. Our HFA adjustment is slightly larger than the consensus, typically ranging from 1 to 3.5 points.
Cold weather, wind, and altitude increase home-field advantage — especially cold weather. Teams that play on artificial surfaces also have a slightly larger home-field advantage. This trend is surprisingly robust empirically and likely reflects the fact that home teams are familiar with the quirks of their own turf; however, the effect has declined as artificial surfaces have improved.
The weather forecast also affects our projected totals (over/unders) for each game. Wind is especially detrimental to offense since it impacts both passing and kicking, two rather important factors in the NFL! The gradual increase in scoring in the NFL over the past few decades is partially attributable to the shift to warm-weather and/or domed stadiums.
Home-field advantage is about 0.75 points larger in playoff games, controlling for other factors. The fanless games played during the COVID pandemic — home teams had a negative overall point differential in 2020 — also provided evidence for an intuitive conclusion: having an enthusiastic crowd can actually matter.
In fact, ELWAY implements a fan avidity rating for each team on a 1-to-5 scale. The primary ingredient in this is the relative rate of Google searches in the local market for the term “NFL” over the past 20 years. However, there are also some semi-subjective adjustments, such as player sentiment about the toughest places to play. Generally speaking, the NFL is a bigger deal in the northern half of the country, reflecting its historical origins at the saddle between the Northeast and Midwest, and HFA is smaller once you get south of roughly Kansas City or Washington. Green Bay and Buffalo are the only teams with perfect-5 avidity rankings. Kansas City is a 4.5, while Baltimore, Cleveland, Denver, Minnesota, New Orleans, Philadelphia, Pittsburgh, and Seattle are the next highest at 4. The two Los Angeles teams are the only 1’s. Keep in mind, however, that a team that overperforms at home may also suffer a bigger penalty in road and neutral-site games.
Non-quarterback injuries are accounted for based on injury reports. Since we already calculated a WAR for each player in Step #2, we basically just sum up all the positive WARs that are expected to be on the sidelines for the upcoming game. Injuries are segregated by offense and defense — so, for instance, a beat-up secondary will primarily affect a team’s projected defensive rating while injuries on the offensive line will mostly impact its offensive rating.
In general, individual non-QB injuries don’t have that large an effect. In extreme cases, such as Myles Garrett or peak J.J. Watt, it might move the projected point spread by up to 2 points. But more commonly, even All-Pro non-QB absences only have an impact closer to ½ point to 1 point. Nevertheless, teams with a ton of injuries all over their depth chart can be significantly affected. Betting on NFL games without accounting for non-QB injuries is a risky proposition.
For weeks beyond the forthcoming one, we do not specifically project return dates or new injuries for non-QB players. However, ELWAY applies several empirically-derived heuristics:
Injuries tend to accumulate throughout the season.
Contending teams return players to action more quickly, while losing teams are indifferent or sometimes actively tank; therefore, in general, teams benefit from playing weaker opponents later in the season.
Injured players are much more likely to return to action in the playoffs.
Injured players return more often after longer rest periods, such as bye weeks.
Finally, older teams accumulate significantly more injuries over the course of the year than younger ones.
Head coaching changes negatively affect performance for the first ~10 games of a new coach’s tenure, with the effect concentrated in the first several games. In-season coaching changes produce a bigger impact than changes in the off-season as teams will have a steep learning curve for new schemes and playbooks.
In our simulations, we also project the probability of mid-season coaching changes that could negatively affect a team’s ratings in future games. In other words, we model a team’s likelihood of firing its coach after each game. Teams with losing records are much more likely to fire their coaches, especially after big losses. Teams rarely fire coaches in the first few games of the season or if they’ve recently been hired, conversely.7
Empirically, there is a small degree of persistence in matchups; for instance, if the Eagles beat the Giants 41-10 in Week 1, it will positively affect their expectation when they play the Giants again later in the season.8 The effect is relatively minor but ELWAY does account for it.
Teams that have “locked up” a specific playoff seed often rest their QB and other star players and can underperform naive projections in the final game of the season (currently Week 18), and ELWAY accounts for this, too. The first cases we can detect of this were in 1989 (Bill Walsh and Marv Levy pioneered the tactic), but it is becoming more common. About two-thirds of the time in recent seasons, a team rests its starting QB under these circumstances.
5. Simulations
ELWAY simulates the rest of the regular season and the playoffs thousands of times from the current starting point. The simulations are dynamic: for instance, what happens in Week 7 of Simulation #1622 also affects Week 8 and the rest of the season in that universe.
In addition to simulating the schedule and changes to a team’s ratings based on its simulated offensive and defensive game scores, we also simulate quarterback changes, quarterback injuries, coaching changes, and non-QB injuries on a probabilistic basis.
For the forthcoming week, the simulations account for weather based on weather reports. For future weeks, weather conditions are simulated based on long-term climate averages for the month of the game in the metro area where the game is played. The simulations account for the historical uncertainty in weather forecasts, which is larger in some cities than others. For teams with retractable roofs, we account for the likelihood of the roof being open, which generally occurs with temperatures between 50-80°F and otherwise favorable conditions.9
The model also simulates a discrete score for each game and these closely resemble real-world values, e.g., a score of 24-17 is more likely to occur both in ELWAY and real-life than 24-18 or 24-19. Among other things, this allows for more precision when calculating the probability of a team beating a given point spread. As bettors know, some point spreads (especially +3/-3) are particularly important because these margins occur more commonly.
The score distributions are based on simulating hundreds of thousands of games, which account for the empirical likelihood of the outcome of a given drive (e.g., touchdowns, field goals, safeties) based on recent NFL data. For instance, we account for the recent proliferation in field-goal scoring. The simulations also take into account how the game score affects 2-point conversions and overall strategy.10 Furthermore, they incorporate new “playoff” overtime rules that have now been adopted for the regular season, in which each team now gets at least one possession in most circumstances.11
In the late weeks of the season and the first two rounds of the playoffs, game times (and in some cases the dates of games) are not set in advance due to flex scheduling. These can potentially affect travel and rest adjustments. For the regular season, we just make some best guesses for which games will be slotted where based on teams’ home time zones; these will be updated as actual schedules are unveiled. For the playoffs, time slots are randomized until the actual playoff schedule is announced.
We also account for the NFL’s tiebreaking procedures to a high degree of precision, skipping only some of the most obscure tiebreakers that almost never come up in practice.
Disclaimers
NFL point spreads and over-under lines are notoriously tricky to beat. Although ELWAY may be somewhat sharp, it is difficult to beat the ~4.5 percent cut of each bet that the bookmaker takes. Thus, we do not recommend betting based on ELWAY blindly, if you do so at all. Always shop for the best lines, and combine ELWAY with your overall football knowledge. Discrepancies between point spreads and ELWAY projections may reflect late information since our last update or idiosyncratic factors that our system does not account for.
ELWAY is new in 2025, and there may be some bugs and miswritten code. If you see something that looks weird, please let us know.
Up to a point; teams may give up when the situation is truly hopeless.
For recent seasons, the split is literally just 50/50 between end-of-season ratings and roster ratings. In our retrospective projections, there was more carryover from season to season before the introduction of the salary cap in 1994.
As rated by AV based on the amount of AV a player generates per snap relative to his position.
The total amount of AV per team is held constant after the positional adjustments. So, for instance, a team with a great QB will have the AVs for other players reduced since QBs are undervalued by AV, while a team with a poor QB will have more value assigned to the rest of the team.
While we believe these are good defaults for the entire period from 1960 to 2025, there are some differences in recent seasons. For example, fullbacks account for approximately 0-1 percent of value now, and aren’t even rostered by some teams.
More precisely, teams that run more plays that involve the QB, including both passes and QB rushing attempts.
Coaching changes tend to peak after a tenure of 3-5 seasons; after that, coaches become “lifers” and are rarely fired during the season.
More specifically, this adjustment is calculated based upon a team’s actual margin of victory relative to ELWAY’s expectation before the game.
Teams with retractable roofs leave them closed more often than not both in reality and in our simulations.
For instance, a team that trails by 7 points late in a game will have considerably more touchdowns and turnovers per drive as they take on more risk, but will seldom attempt a field goal. Conversely, teams often sit on the ball or adopt conservative strategies with leads in the fourth quarter to run out the clock, which can reduce margins of victory even if the game is well in hand.
We expect the new rules to affect scoring distributions, e.g., 1-point victory margins will become more common because if the first team to possess the ball scores a touchdown and the opposing team reciprocates, the optimal strategy is usually to go for two. We also expect a slightly higher rate of ties over the long run — approximately 0.5 percent of all NFL games — because of the absence of sudden-death rules.