How we calculate our PELE ratings
Way more detail than you asked for on the methods behind our new soccer model.
PELE is Silver Bulletin’s rating system for international soccer teams. Each team gets two principal ratings: a PELE rating describing its overall skill level and a Tilt rating indicating its propensity toward attacking or defensive play. Based on these ratings, we can evaluate past match results and forecast future matches. PELE ratings are updated continuously and backdated to 1872 (!).
PELE is also the backbone of our 2026 World Cup forecasts, set to be published later. We’ll update this article with any World Cup-specific adjustments once they’re ready.
We’re extremely proud of PELE, but it was a lot of work, and it’s not our simplest model. This article describes the system in detail.
The basics of PELE
PELE stands for Predictive Elo with Lineup Equilibria. We know it’s a little bit nerdy, but this backronym captures most of the essential features of the system:
Predictive means that the goal of PELE is to probabilistically forecast the outcome of future soccer games. These aren’t the FIFA rankings: we’re not interested in which teams are most “deserving” of a particular slot. Rather, we’re looking for factors that have a predictive impact. International football teams play relatively few important games, and some of the most predictive indicators don’t derive from match results alone.
Elo means that PELE shares many properties with an Elo rating system — and indeed, PELE ratings are designed to be comparable to Elo ratings such as the FIFA rankings or the World Football Elo Ratings.1 As with other Elo ratings systems, PELE ratings are updated iteratively at the end of each match, and updates are zero-sum. (If Brazil beats Bolivia 4-1, whatever gain Brazil makes in its PELE rating is offset by a loss of points for Bolivia.) However, PELE deviates from traditional Elo ratings in other important respects, as we’ll describe below.
Lineup means we use player market values and age data from Transfermarkt to help anchor PELE ratings. We look at the market values for the top 23 nationals2 with their respective club teams, with some soft positional constraints. For years since 2005 (when Transfermarkt’s coverage begins3) team ratings are gradually “nudged” toward team aptitude as estimated by these player values. Player ages, weighted by market value, also affect the system — younger teams are expected to improve, while older teams are expected to decline. PELE also uses market values to help calculate whether a team’s strengths are oriented toward offense or defense.
In addition to market value data, we consider each country’s region, GDP, and football legacy. (The regions are not based on FIFA’s six confederations; we’ve developed our own system of 12 overlapping soccer regions, which we believe to be more predictive and more geographically accurate.) But these other factors are less important after the introduction of Transfermarkt data in 2005.
Equilibria serves as a catch-all to capture some other important features of the PELE system. PELE contains many parameters that work together to converge on (we hope) the best possible ratings. But these two mechanisms are particularly important:
National team results essentially compete with the model’s expectations based on player market values, ages, GDP, region, and team history. The prior gradually pulls each team toward its long-run expectation based on these factors, while the match results push against this if a team consistently outperforms or underperforms PELE’s assumptions.
PELE calculates two sets of ratings for each team: an Elo-like PELE rating that measures overall squad quality, and a Tilt rating that indicates whether the team tends to be attack- or defense-minded. Tilt ratings are based on (1) whether games involving the team tend to produce more or fewer goals than the model’s expectations, and (2) roster composition. PELE and Tilt ratings can be combined with global scoring trends to derive a score matrix for each game, i.e., the probability that Germany wins exactly 2-1 over Australia, or ties it exactly 0-0, etc. These can be used to estimate win/loss/draw probabilities for any given matchup, or to impute offensive and defensive ratings for each squad.
Scope and coverage of PELE
PELE covers international matches between teams that were both FIFA members at the time. There is an exception for teams that played widely recognized international matches before FIFA was formed in 1904. We think of these pre-FIFA countries as the “Original Ten” (analogous to the NHL’s “Original Six”). They are England, Scotland, Wales, Ireland, Argentina, Uruguay, Austria, Hungary, Belgium and France. Matches between “B-teams” or played under significant roster restrictions (i.e., the Olympics in recent years) are excluded.
Relying on FIFA membership dates essentially outsources decisions about when a national team reached “maturity” to FIFA, an organization we have mixed feelings about. Nonetheless, this provides a semi-objective basis for articulating “official” games among the myriad matches that have occurred over the past 100+ years between 200+ national or sub-national entities. We tested various alternatives to FIFA membership dates, but they only slightly increased the number of PELE-eligible matches while adding more subjectivity. FIFA membership dates are researched precisely.
Nonetheless, we had some decisions to make, particularly regarding which national teams are considered to be continuations of previous teams. Indeed, nearly every political dispute of the past century shows up in some form in historical soccer data. In making these decisions, we tried to be as consistent as possible based on the history, geography and economics of countries being reformulated. FIFA regards West Germany as having inherited pre-WW2 Germany’s football legacy, for example, and considers reunified Germany to be a continuation of West Germany. Our definitions are stricter and treat major changes in national boundaries as discontinuous, such as the split and reunification of Germany, the formation and breakup of the Soviet Union, and the breakup of Yugoslavia.
However, minor changes such as Timor-Leste splitting from Indonesia are tolerated. There are inevitably some judgment calls: we consider the creation and collapse of the United Arab Republic to be a discontinuous event for both Egypt and Syria, for example. In some cases, countries are considered dormant or hibernating and then “reincarnated” if they return to roughly their original boundaries: for example, the Baltic states before and after the formation of the USSR.4 In addition to the 211 current FIFA members, we calculate ratings for 17 nations that were FIFA members at some point but are now essentially defunct, e.g., Czechoslovakia — though defunct teams are excluded from most of the charts we display. Overall, almost 50,000 international matches since 1872 are included for PELE consideration.
PELE Phase 1: “basic” Elo-type ratings
Within our model, PELE is calculated in two phases. The first phase is “simple” and empirically establishes our parameters, such as the regional coefficients or the varying importance of home-field advantage. The second phase introduces mean reversion toward PELE’s expectations based on Transfermarkt player values and other data. This section describes Phase 1.
Some features of PELE are very Elo-like:
Rating updates are zero-sum. Whenever a team gains ground from match results, this is offset by its opponent losing an equal amount of PELE rating points.
By definition, the average PELE rating for the 211 active FIFA countries is 1500.
PELE relies on contemporaneous information. In other words, Hungary’s rating on (for example) April 5th, 2012 is based on information that would have been available as of that date. We don’t go back and recalculate ratings based on post-facto information (i.e. Hungary lost to Norway, but it turns out that Norway was stronger than we assumed at the time).
Any Elo-type system also relies on a number of parameters that govern the overall behavior of the system. To the extent possible, PELE seeks to derive these empirically.5 The next section describes PELE’s approach to some of the most important ones.
Home-field advantage, match importance, and other parameters
We undertake highly detailed calculations for home-field advantage. In general, home-field advantage is very important for international soccer matches, and we think the impact of HFA is underrated by other systems. There are several components to our HFA calculations.
HFA varies over time, and this is derived empirically. In general, HFA increased after WWII, rose until the 1980s/1990s and has been declining since then, perhaps because travel accommodations for visiting teams are improving.
Travel distance impacts our HFA calculations.6 Traveling from Brussels to Amsterdam is less burdensome than flying from South Korea to Brazil. Travel distance is more important for neutral-site games.7
Altitude has a significant impact on teams like Mexico and Bolivia, who typically play their home matches at high altitude. These teams tend to have a significant home-field advantage because they are better acclimated to local conditions. Indeed, the formula is nonlinear. As you may have experienced yourself, engaging in intensive physical activity at 10,000 feet is more than twice as hard as at 5000 feet.
PELE also calculates customized HFA coefficients for each team. More precisely, these measure the spread in team performance, relative to PELE’s expectations, in home versus “nonhome” (neutral + road) games. Like the global HFA rating, each team's custom ratings evolve over time. In general, teams in far-flung and war-torn places tend to have larger HFAs, while richer nations in Europe and the Middle East tend to have smaller ones. Bolivia has the largest HFA, mostly because of its altitude. But the customized HFA adjustments are generally pretty conservative; this data is noisy.
Another important consideration is the importance of each match. A friendly will be taken less seriously than a World Cup knockout game. Some previous research, including my own work for ESPN’s Soccer Power Index, suggested that low-impact matches nevertheless provide substantial predictive value. But the actual situation is more subtle. Low-impact matches, such as friendlies, tend to predict performance in future low-impact matches, and most of the dataset consists of these. However, high-impact matches like the World Cup or the Euros tend to better predict performance in future high-impact matches. We considered developing ratings on two parallel tracks (i.e., a friendly match rating versus a “serious” match rating), but it wasn’t quite worth the added complication.8 Instead, we use these importance factors to weight the value assigned to each match. There’s roughly a threefold difference between the most and least important games.
Friendlies: 0.5-0.7x9 multiplier
Minor and friendly tournaments: 0.7-0.9x
Regional tournaments and Olympics10: 0.7x (qualifiers)-1.0x (main tournament)
Continental tournaments (e.g. the Euros): 1.3x (qualifiers)-1.4x (main tournament)11
World Cup: 1.5x (qualifiers)-1.6x (main tournament)
There’s another subtle consideration in PELE: friendly matches tend to slightly collapse the difference in team quality. England will treat a World Cup qualifying match against San Marino with more urgency than a friendly against the same opponent. This is also accounted for by the model.
Differences between PELE and true Elo systems
There are also some important ways in which PELE deviates from traditional Elo ratings.
Some Elo-type systems adhere to a strict principle: winning always helps your rating. In these systems, if England beats San Marino 2-1, England will see some (typically very modest) rating improvement, while San Marino’s rating will decline. The polar opposite is to train the model on the score differential. For instance, if England is expected to defeat San Marino by 5 goals, the 2-1 scoreline will reflect that England underperformed the Elo expectation by 4 goals.
PELE basically strives for a compromise between these. Rating updates are based on what we call harmonic margin12 or “h-margin”. In h-margin, each additional goal has diminishing returns: the second goal in a 2-goal victory counts ½ as much as the first one, the third goal counts ⅓ as much, and so on. This is particularly important in soccer, where the current score substantially affects tactics: if you’re already ahead 3-1, whether to press for another goal or to collapse into a more defensive shell isn’t obvious. Matches won in penalty shootouts — there is some skill in penalties, believe it or not — are regarded by PELE as basically halfway between an outright win and a draw.
Initial ratings and the geography of international football
In Elo-type systems, each team or player usually starts with the same initial rating, typically 1500, before playing any games or matches. However, this can introduce some information loss. To a large extent, which teams are strong at football is predictable. From first principles, for example, you’d expect Argentina to defeat American Samoa. Even if you’d never seen a soccer game, you’d know that Argentina is much larger, has a much longer football legacy, and comes from a region where football plays a much more prominent role in the culture.
Technically speaking, PELE ratings also start out with a blank slate (everyone at 1500). However, we iterate the ratings dozens of times to converge on what we call a “GDP prior” for each country. The GDP prior is based on three factors:
A country’s purchasing power parity GDP (adjusted for the standard of living) at the time it became eligible for PELE-rated matches. More specifically, we use the natural logarithm of a country’s GDP (there are diminishing returns to economic growth from a soccer standpoint). We use aggregate GDP, not GDP per capita, so both population size and living standards per citizen matter.13 Each country’s GDP is expressed as a fraction of world GDP at the time, so there is no bias introduced from when a country begins playing. Most of the GDP values are taken from the Maddison historical database, which is about 75 percent complete for the countries and years we care about. However, some GDP values are missing in Maddison: for example, Maddison lists the value for United Kingdom GDP rather than for the respective home nations (England, Scotland, etc.) while the home nations compete individually in football. The missing GDP values were filled in using Claude Opus 4.6, with careful oversight from yours truly. For the most part, this process is fairly straightforward. UK GDP can be divided between the home nations based on their relative contribution to the UK’s GDP, for example. Or we can estimate Estonia’s GDP prior to its being taken over by the Soviet Union based on the living standards at the time of comparable countries like Finland.
A country’s “legacy year”: that is, the first year that it or one of its predecessors became eligible for PELE-rated matches. A longer legacy correlates with higher performance even years later, although this is the least important of the three factors discussed here. For the Original Ten, the legacy year is the year of its first widely recognized international match; for all other countries, it’s the year they became a FIFA member. Countries inherit legacy year status from any defunct teams that substantially overlapped with their territory: for example, both Northern Ireland and the Republic of Ireland inherit unified Ireland’s legacy year prior to the Partition of Ireland.
Finally, we consider a country’s football region. PELE does not use FIFA confederations for anything: these do not correlate all that well with football performance, and can be influenced by political and other arbitrary factors. They can also be blunt instruments for continents as large as Asia. Instead, we crafted our own set of 12 regions, which deliberately contain some overlap. I’m going to be honest, we went through a lot of different versions of these.
Basically, we started out with the six populated continents and then carved out some logical geographic boundaries to create more precision:
Oceania survives pretty much intact from conventional definitions. Note that Indonesia is considered a transcontinental country, belonging to both Oceania and Asia.14
The Americas are also self-contained, but the conventional boundary between North and South America doesn’t adequately reflect the cultural distinctions within the region (and how this tends to map to football strength). Instead, we divide the Americas into three regions: 1) North America, defined as the continental territory stretching to the Darien Gap, and 2) the Caribbean, and 3) Latin America. Eligibility for Latin America is defined as any country in the Americas where the predominant language is Spanish or Portuguese. We would have included French also (it’s a Latin language), but there aren’t any FIFA members per se that would qualify on this basis.15 The decision to split off the Caribbean was important. It’s not a terribly important part of the world football-wise. But the idea of the regional groups is that there should be a coherent economic or cultural tie that can inform our priors. And the notion that, say, Haiti’s football performance tells you anything whatsoever about Canada’s or the United States’s rating felt like a big stretch. Even with the 3-way split of the Americas, there is deliberately some overlap: Mexico, for example, is both a North American country and a Latin American country. (We’re sort of implicitly creating a Central America region, in other words, which takes the average of the Latin America and North American values.) Note that Guyana and Suriname are not Latin American countries despite being in South America; instead, they’re considered Caribbean countries.16 Only Canada, the United States and Bermuda17 are purely in North America.18
Next, we carved out a Middle East region based on its reasonably well-defined geographic boundaries. Most Middle Eastern countries are “taken” from Asia, which badly needs to be subdivided because it consists of so many unlike parts. However, the Middle East also has two FIFA members from Europe (Turkey and Cyprus) and one from Africa (Egypt, which is transcontinental due to the Sinai Peninsula). These countries are treated as hybrids between the Middle East and other regions.
The other big carveout is Ex-USSR, and we’ll admit that this one is more debatable. The Soviet Union existed for 70 years and affected football culture and development pipelines in ways that are still persistent today. Former Soviet Republics like Moldova are historically weak at soccer as compared to Europe — as is Russia itself, really, given its GDP. But the Central Asian ex-USSR countries are relatively strong as compared with most of Asia. (And Central Asia doesn’t fit neatly into any of our other regions anyway.) The Baltics — Estonia, Latvia, and Lithuania — are considered hybrids between the Ex-USSR and Europe because they were the only former Soviet countries to have football teams prior to the formation of the USSR. We avoid making exceptions based on political developments: Russia is at war with Ukraine, but the war is recent, and Ukraine isn’t (explicitly) a NATO or EU member.
That leaves Europe as everything in the continent outside the former Soviet Union. We considered some further divisions, e.g., lumping in some nations with ex-USSR into an “Eastern Bloc” region. But these definitions are fuzzy and historically contingent19 and footballing strength has historically been roughly even across Europe, especially once controlling for GDP. Both Western European countries, like England, and Eastern European ones, like Hungary, have strong football traditions.
Asia and Africa, on the other hand, are large and warrant further division. We already carved out the Middle East and the Asian parts of the ex-USSR region (basically Central Asia) from the rest of Asia. The remaining portion of Asia nevertheless spans a large landmass and more than half the world’s population. The logical division is between South Asia and East Asia. South Asian nations are almost universally underachieving at soccer. East Asia, conversely, contains some of our bigger outliers — notably, South Korea and Japan have become very good at soccer in contrast to the rest of the region. But China is underperforming. Still, priors are priors for a reason: they’re good default assumptions that are sometimes violated. We considered creating an “Asia-Pacific” region that would also include Australia, but this went too far down the road to gerrymandering based on footballing strength. However, Southeast Asian countries are hybrids between South Asia and East Asia in our scheme, so this creates another implicit region along the lines of Central America.
Finally, Africa is often treated as an undifferentiated mass by Westerners, but nearly all the stronger African sides you’d think of are either in North Africa or West Africa. These countries have different religious and cultural traditions and different relationships with colonial European powers that correlate with football strength. Therefore, Africa is divided into three regions: North Africa, East Africa, and West Africa, with some overlap. North Africa is reasonably well-defined by geographers, so we’re basically splitting the rest of the continent — Sub-Saharan Africa — into two parts. West Africa includes some of the continent’s strongest football countries. East Africa is a slight misnomer: it might be labeled East/Southern Africa if we were even more precise. But teams from that part of the continent don’t tend to reach the same heights.
In case you’re curious, the order of the regional coefficients in terms of how well they predict team quality is:
Latin America
West Africa
Europe
North Africa
Caribbean
East Africa
Middle East
Ex-USSR
North America
Oceania
East Asia
South Asia
This ordering might be surprising, but remember that these ratings control for GDP (and legacy year). Europe might be better than West Africa at football, but its teams have longer histories and it’s much wealthier; the residual coefficient for West Africa is (slightly) higher once you control for that. The Caribbean surprised us as a modestly high-scoring region, but it has many tiny countries that punch above their weight.
PELE Phase 2: advanced ratings with mean reversion and player market values
In traditional Elo-based systems, ratings only change after games/matches are played. Silver Bulletin’s other sports models, like COOPER, already violate this principle because ratings are partly reset or reverted toward priors at the start of each season.20 In international football, however, there’s no “season” per se. So instead, each team’s ratings are very gradually “nudged” toward PELE’s expectations every day based on PELE’s priors. The magnitude of the nudge was calculated by comparing teams’ Phase 1 PELE ratings with their PELE ratings two years later. This allowed us to determine which factors predicted changes in team performance.
However, this is a deliberately slow-moving process: it would take decades for an outlier like Japan to fully converge on its prior. Moreover, teams can push away from this pull toward the prior with consistently good match results. Elo-type systems use a “K-factor” to determine how much the ratings change in response to new results. In Phase 2, our K-factor is set slightly higher than in Phase 1; in other words, match results matter slightly more, giving teams an opportunity to offset the mean-reversion. In practice, the prior is relatively less important for countries that play many international matches, but matters more for teams that compete infrequently in important games.
Prior to 2005, this mean-reversion is just based on the GDP prior: in other words, a country’s GDP, its legacy year and its region. It doesn’t have much effect: regions and legacy years do not change at all, and with some exceptions like China, relative GDPs only change slowly. Thus, prior to 2005, the Phase 2 mean-reversion process barely has a discernible impact on the ratings.
Beginning in 2005, however, an extremely valuable set of data becomes available: player market values and ages as estimated by Transfermarkt. Transfermarkt is exceptionally comprehensive, to the point where we can basically estimate a snapshot of any national team’s market value for any given date since Jan. 1, 2005. These can have a big impact: Norway’s rating is clearly boosted by Erling Haaland’s presence, for instance.
The Transfermarkt data does require some work to process, however. While we could simply add up the aggregate market value for every player of a given nationality, this might introduce coverage bias.21 Furthermore, football matches are 11 players a side, so that France’s 200th-best player is better than Canada’s 200th-best player is irrelevant. Instead, we construct a 23-man roster for each past date. (World Cup rosters traditionally consisted of 23 players, although this has now been expanded to 26.) These rosters are populated in descending order of Transfermarkt values, with a hard constraint on goalkeepers (exactly 3) and some softer constraints on the other positions (a team can’t field a lineup consisting entirely of strikers). The starting 11 gets full credit for its market values, while the reserves (players #12 to #23) receive partial credit based on a sliding scale (the first several reserves get nearly their full market value; the end of the bench doesn’t). If a team doesn’t have enough players listed in Transfermarkt to fill out a 23-man roster, remaining slots are treated as having zero market value.
For the 2026 World Cup specifically, we’ll use actual team rosters rather than our guesstimates once they become available. These will account for injuries and player absences, and players who wind up on different countries than their primary Transfermarkt classification. For “regular” PELE, however, the rosters are constructed algorithmically.
We also compute team ages based on this data, weighting ages based on each player’s market value to their squad. For teams without full rosters, ages are adjusted toward a mean of 26.5 years. Younger teams are usually projected to improve, and older ones to decline.
One big question is whether Transfermarkt data is biased toward certain countries, and particularly toward Europe, since the most valuable club teams in the world are overwhelmingly concentrated there. We investigated this carefully and found little overall pro-European bias. However, through analyzing historical World Cup rosters, we discovered one important mechanism we need to account for. In certain countries, some players tend to “stay at home” or play for other teams in their home regions, even though they would be skilled enough to play for a top-flight European club if they wanted to. After careful investigation, we found that three groups of countries are affected by this:
Latin America. Many strong players from Mexico and Central America play in Liga MX rather than going to Europe. To some extent, South American players can also elect to remain with their domestic clubs.
Wealthy Asian countries: specifically the Gulf States and the five OECD members from Asia or Oceania (Japan, South Korea, Australia, New Zealand, Israel). These countries often have vibrant domestic leagues and can offer a high quality of life to local stars.
Finally, six geopolitically isolated countries (North Korea, Eritrea, Cuba, Iran, China and Russia22) that impose harsh constraints on emigration.
Countries from these regions are underrated by Transfermarkt values, and so we correct for this. It’s not that the Transfermarkt values are necessarily “biased” against these countries but that players from these regions will sometimes opt not to fully maximize their market value, instead making a sacrifice for the comforts of home (or they won’t be given the choice in the case of a country like North Korea). Nearly all elite players from Africa and Anglo North America migrate to other countries for their club play, however.23
There’s one other subtle factor in calculating our historic ratings. Suddenly, a whole bunch of new data becomes available on 1/1/2005. Rather than easing into the new regime gradually, we found PELE performed considerably better in the early Transfermarkt era (~2005-2010) if we made a one-time step-function adjustment to team ratings on 1/1/2005 to account for the new data. Essentially, this step-function banks in 10 years of reversion toward the new, more informed prior. If you look very carefully at PELE’s historic ratings, you may see bigger changes in 2005 than in other years.
However, as valuable as the Transfermarkt data is, the “Phase 1” PELE ratings are already pretty smart based on match results and the GDP prior.24 The player data is certainly worth worrying about and aligns PELE with betting odds more precisely, but it’s an important factor rather than a dominant mechanism in the system.25
Tilt ratings and expected goals
So far, I’ve barely even described one of the most important features of our system: Tilt ratings.
When I created SPI for ESPN for the 2010 World Cup, our system had separate offensive and defensive ratings for each team, characterized as their projected number of goals scored and allowed per match. The offensive and defensive ratings could then be combined to project scores and calculate overall quality ratings.
While I think SPI was a smart system, I actually think this technique wasn’t well-suited to soccer. Unlike a sport like American football, offense and defense are fluid in the sport. There isn’t a distinct platoon on either side. And soccer is deeply tactical: a team might play completely differently with a 1-goal lead than a 1-goal deficit. A game between two quality opponents could easily turn into a tight match or a shootout.
However, we can evaluate which teams tend to be involved in high-scoring games. This is basically what Tilt does: it measures whether matches featuring the team tend to involve more combined goals for both sides (positive tilt) or fewer (negative tilt). So while PELE is our measure of overall team quality, Tilt is more a measure of mindset. In sporting gambling terms, you’d use PELE to set the point spread or the odds of a team winning, and Tilt to project the over-under.
However, having a positive or negative tilt rating isn’t inherently good or bad. Teams can succeed — or fail — with more attacking styles or more defensive ones. In fact, PELE and Tilt ratings are, by design, mostly uncorrelated (the cool kids would say they’re orthogonal):
We can, however, combine PELE and Tilt to project goals scored and allowed in each match. We do this in the round-robin table of the PELE landing page, for example, which simulates a round of matches on a neutral field between all of the 211 FIFA teams and all 210 of their opponents.26 A team’s projected number of goals scored and allowed against the round-robin are basically equivalent to SPI offense and defensive ratings, just with the process reversed:
SPI: Offensive rating + Defensive rating → Overall quality rating
PELE: Overall quality rating (PELE) + Tilt → implicit offensive and defensive ratings
This process can also be used to project the odds of a win, loss or draw between any two teams under any circumstance (home, road, neutral, etc.). Indeed, this is the process we’ll use for our probabilistic World Cup projections. But it is among the more complicated aspects of the system.
Our starting point is to create a projection of the number of goals in the match, derived from our database of nearly 50,000 historical results. The ingredients in this projection are as follows:
By far the most important factor: our rolling leaguewide baseline of overall goal-scoring in typical matches. Soccer has gone from featuring 4-5 goals per game at its inception to just 2-3 goals (combined between both teams) per match now. Our model calculates this baseline by averaging the past 5 years of data from all international matches, with a trendline term (is overall goal-scoring rising or falling globally?) and a correction that downweights outlier matches (Australia 31–0 American Samoa).
We also account for the difference in team quality, as measured by PELE.27 Matches between teams with large rating gaps tend to yield much higher scores.28
The importance of the match: more important matches tend to play tighter and feature fewer goals. In general in soccer, higher quality of play is associated with fewer goals: it’s completely different from something like the NBA in this regard.
And whether the game was played at a neutral site; neutral-site matches tend to play a little more wide-open.29
Next, we looked at whether there were systematic differences in scoring propensity, i.e. whether some teams tend to produce higher-scoring games beyond what their PELE rating might imply. Our Tilt ratings are the solution to this: Tilt is basically the difference between actual goals and expected goals based on our formula (regressed strongly toward a mean of zero because goals are rare in soccer and the raw signal is noisy). If matches involving Germany (canonically attack-minded) tend to produce higher scores than others, Germany will get a positive tilt rating, while a negative rating indicates a team whose matches tend to produce lower-than-expected scores like Senegal.
Tilt ratings have two subcomponents. Since Transfermarkt player data begins in 2005, we can sum up all the player values and allocate them to either offense or defense based on their positional assignments. As of mid-2026, for example, most of Norway’s value is concentrated on offense (Haaland) while most of Nigeria’s is on defense. The overall split is designed to be 50/50 based on these positional allocations:
The resulting effect on goal-scoring is what you might predict: teams whose strengths are concentrated on offense tend to produce higher scoring games.
The second component is tactical tilt. Figuratively, it’s whether a team prefers to play a more open/attacking style or a tighter, more defensive one, trends that can be persistent over decades based on the coaching regime and the soccer tradition in each country. More literally, it’s the residual number of goals scored relative to PELE’s baseline expectations.30 Tilt ratings are strongly hedged toward a mean of zero since measuring this attack/defense tendency is noisier than measuring differences in team quality.
The score matrix and future match predictions
From PELE and Tilt, we can project the number of goals for each team in any given match. Technically, the way the model does this is by first projecting overall goals based on leaguewide trends and each team’s Tilt rating, and then dividing the goals between the projected winner and loser.
But a projection like Spain 2.7-Finland 0.8 only tells you so much. Soccer is a low scoring game with a lot of draws, so the precise number of goals matters. Spain can score exactly 2 goals or exactly 3 goals or zero goals or some other integer, but they can never finish the game with 2.7 goals.
The Poisson distribution is designed to handle this sort of situation31 and is the traditional choice in soccer models. But it tends to have some problems. Particularly, Poisson understates variation too much, tending to underestimate both the number of draws (especially 0-0 draws) and the number of blowouts (Germany 7-1 Brazil). PELE’s solution32 is a negative binomial distribution with a correlation term. It’s not so important you know precisely what this means but that you capture this basic intuition: one team’s score affects the other team’s tactics. A 0-0 match tends to play tighter 75 minutes in than 3-2: it’s already been a low-scoring game but it will often play tighter still from that point forward. Our method does a good job of matching empirical goal-scoring distributions: e.g., there are about the right number of draws and blowouts in PELE.
PELE calculates a precise score matrix for each game (the chance Team A beats Team B by a score of exactly X-Y). It then sums up the cells in this matrix to estimate probabilities that the game ends in a win, loss or draw. For instance, here is the matrix for the United States’ first World Cup match against Paraguay on June 12, projected to be a low-scoring affair:
There are some complications introduced by extra time and penalty kicks, such as will be used in the World Cup knockout stage. Games eligible for extra time potentially increase the amount of gameplay by roughly 33 percent and empirically convey a slightly heavier advantage to favorites. PELE accounts for all of this. Historically, about 30 percent of draws after regulation are resolved in extra time before penalties, so PELE takes 30 percent of games projected to be draws in regulation and assigns a winner. The other 70 percent of extra-time games go to penalties. Based on our analysis of several hundred past penalty shootouts, there actually is some skill in them; better teams and home teams tend to win shootouts more often, though edges in most situations are rarely more than about 60/40.
World Cup adjustments
We’ll update this document with more detail on our official World Cup forecast once it’s ready. Two things we plan to account for are using actual World Cup rosters once they’re announced instead of our algorithmically generated rosters, and accounting for the incentives in each match. In matches where both teams would advance with a draw, for example, teams have shown a remarkable aptitude to conspire to secure one.
Note, however, that each system has different means and standard deviations. PELE is strict about enforcing the norm that a 1500 rating = an average team, but the other systems are not. FIFA ratings tend to be lower across the board, for example. As a practical matter, it’s often easier to compare rankings (1st, 2nd, 3rd) rather than ratings (2100, 2055, 1992).
World Cup rosters traditionally had 23 players (now they have 26). Limiting the scope of PELE to the top players also limits the impact of coverage bias in the Transfermarkt valuations.
Technically, it began in late 2004, but it took a few months to ramp up to relatively complete coverage, so we just use 1/1/2005 as a clean cutoff date.
Another edge case is South Africa; we consider it to have been “dormant” during its long FIFA suspension/boycott during apartheid.
We do this by iterating the model many times until it solves for the right parameters. Still, it can help to have a few “universal” structural parameters that are essentially hard-coded; otherwise, you can wind up with a “too many moving parts” problem.
We also tested travel distance based on the number of time zones rather than the number of kilometers traveled, but this was inferior across all statistical tests.
We think this is basically because home versus road provides a lot of signal that travel distance is somewhat redundant with, whereas for neutral-site matches, travel distance is really all you have to go with. Both in theory and in practice, Mexico gets a quasi-home-field advantage when playing games in the United States, for instance.
Although if you’re actually betting on friendlies, considerations such as how seriously each side tends to take friendlies and what the incentives are in a particular match will be important to consider.
PELE weights friendlies and minor tournaments slightly higher if the teams involved in the game are bad, because otherwise these teams won’t have many important matches played at all. Any match that Andorra plays is its “World Cup” basically.
The Olympics do not qualify for PELE consideration in recent years because they mostly use U-23 rosters, but they used full rosters in some long-ago circumstances.
We were surprised at how little difference there was in the amount of signal provided by qualifiers for major tournaments and the tournaments themselves. Most countries take these games very seriously, using their best internationals.
Because it’s actually derived from the harmonic series.
Yes, we tested separately using population and per-capita GDP. It didn’t help at all and made the model more complicated for no predictive benefit.
Indonesia is the only country split three ways (between South Asia, East Asia and Oceania). This is annoying, but reflects that Southeast Asia is already treated as a hybrid region between South Asia and East Asia, and Indonesia’s presence on New Guinea means that geographers usually also consider it partly in Oceania.
Haiti is often considered a part of Latin America, but it predominantly speaks Creole.
For what it’s worth, this follows FIFA’s standard, as Guyana and Suriname are members of CONCACAF (North America) rather than CONMEBOL (South America).
Bermuda sits at a much higher north latitude and is almost never considered a proper Caribbean country by geographers.
Mexico, etc., are cross-hatched with Latin America. As a believer in American exceptionalism, I don’t think there’s any good “comp” for the U.S., except maybe Canada. But actually, the coefficient for North America is weaker than the one for the Caribbean once you control for GDP. As a result, establishing a separate Caribbean region hurts the U.S. rather than helps it.
For instance, the Warsaw Pact only existed for about 40 years, about half as long as the Soviet Union, and all the countries in the Warsaw Pact had footballing traditions beforehand.
In the case of COOPER, for example, they’re reverted toward a formula based on conference strength and preseason rankings.
i.e., if even the most minor European leagues were included, but only the most prominent players from other continents were. Honestly, though, Transfermarkt’s coverage is remarkably comprehensive, especially from 2010 onward.
Since 2022 only, when sanctions essentially knocked Russia out of the UEFA universe.
MLS has not yet established enough strength to be compelling to truly top-tier American players.
In a regression equation, the GDP prior has an R-squared of around .83 in predicting historical PELE ratings.
On average, the difference between Phase 1 and Phase 2 ratings is only about 25 PELE points.
Although, the round-robin simulations are weighted based on how many games each opponent played in the historical dataset and the importance ratings for those games. This means that teams in the round-robin table face slightly above-average competition, on average, relative to the entire cohort of 211 FIFA teams, because higher-quality teams tend to play more games, especially in major tournaments. Each team faces the same weighted schedule in the round robin, however, other than not playing itself.
As well as applying our various home-field adjustments.
Indeed, there is an exponential relationship between the PELE ratings gap and projected goal scoring.
We’re not quite sure why this is. Perhaps teams tend to become more risk-averse to protect a lead when they’re playing at home. Or perhaps it has to do with the officiating. But it shows up as a robust signal in the data.
The residual is calculated after lineup adjustments are applied. In other words, tactical tilt controls for the personnel on the pitch, along with the other factors I mentioned.
i.e., a situation where you know the average outcome of some variable (2.6 goals) but have to allocate them into discrete buckets of integers (0, 1, 2, 3, etc.).
I do need to give Claude Opus 4.6 a hat tip for proposing and testing a number of possible constructions for the score matrix until it identified one that matches empirical scoring distributions extremely well.




