Elo for Football

For those interested there is a Q/A at fivethirtyeight.com where Elo is discussed in regards to applying it to NFL football. Here I am a bit more descriptive than the Q/A which fails to elaborate in some areas. I also recommend the wikipedia article on the Elo rating system.

What is Elo?

Elo is a rating system designed for head-to-head matchups. It is named after its creator Arpad Elo, and it is not an acronym for anything in particular.

Elo is designed to take opinion and marketing out of the rating process. Only the actual result of a matchup is measured and credited or debited from a participants rating. It helps to form ranking systems less influenced by human biases, except, of course, what values are used to form the rating. That is not to say it is free from all bias. Mathematically, the past history of the participant is always going to, at least temporarily, bias their rating. It cannot account for a recent accident which has made the competitor incapable of performing at a previous rating. Elo was also very useful before the internet enabled matchups between opponents who are distant geographically.

Elo was popularized as a chess rating system to deal with the difficulty of rating and ranking players for competitions. In fact if you have ever heard of a chess master, much of what goes into determining their mastery is a high Elo based rating. It is desirable for competitive purposes that better chess players play similarly graded opponents. Additionally, for the purposes of ranking it is desirable that higher skilled players are not rewarded for beating up on lower ranked players in an attempt to pad their ranking. Elo is also designed to deal with the challenge that many players will never encounter each other. In other words, when the network of matchups is sparse. Increased sparsity does still bias the rating system. However, higher level competitions bring together top performers to level out this problem.

Elo and similarly derived ranking systems are used across many competition platforms. Video games, sports and other competitions have adapted the Elo rating system to their purposes. In fact the application of the Elo rating system to football is more expressive than its application to chess. In chess it is harder to quantify the strength of a win as piece counting or turns can indicate style vs. strength. While, in football, the differential in points is a relatively good indicator of the difference in team quality especially in offensive leagues like the CFL.

Why use it?

Elo is in a lot of ways a quantification of what human’s do all the time with qualitative opinions on teams. We give credit to teams who win, and reduce our opinion of those who lose. Elo is also a zero-sum game. A team who wins gains the same amount of credit as costs the team who loses. Elo also can be modified such that underdogs get more credit for a win over a favoured opponent and favourites gain less for beating up on uncompetitive opponents.

Elo is also in many ways more expressive than win-loss columns alone. Win-loss columns are a reduction of information into a singular bit of information. Did a team lose, zero points, or did they win, one point. In case of ties this has to be expanded to allow for half points. In comparison Elo starts with every team beginning with the same initial point total. Then for every matchup this point total is increased, on a win, or decreased with a loss. The amount of this change starts with a standard value which is then increased relative to how favoured the competitor was to win/lose and by how many points did the competitor win/lose by. Rather than a single value, the multi-point total earned/lost expresses more information about the result of the competition.

What are the basics?

Every team begins with an Elo rating of 1500. (Mathematically, this actual specific value of 1500 does not matter in any way. However, it is nice visually to have one sufficiently positive such that low performing teams don’t all of a have negative values. You could start at zero, if you wanted, or even one million. However, in practice this is avoided.)

Additionally, we will give every game a value of K = 20. Therefore, before any other factors, a team who wins will gain K points and the other team will lose K points. For example, if we have two teams team_{A} and team_{B} competing with ratings of ELO^{before}_{team_{A}} = 1500 and ELO^{before}_{team_{B}} = 1500, then if team_{A} wins then ELO^{after}_{team_{A}} = 1520 and ELO^{after}_{team_{B}} = 1480. If they tie, their ratings would remain unchanged.

If we want to clean up this formula, ELO^{after} = ELO^{before} + K * WL where WL = 1 if the team won or WL = -1 if the team lost.

Why choose K = 20? In many ways K has more influence than simply the value to adjust a teams rating by. It effects how responsive a teams rating is to an individual competition event. The larger the value the greater the fluctuation. In European football, different levels of events are given different ratings which attempt to express the rarity of the competition and hopefully how seriously the country participating takes the event. For example, higher rated events such as World Cup finals get a value K=60, while friendlies are given K=20.

Win expectancy?

Win expectancy is a measure of what are the odds that one team wins versus another. More particularly, the percentage chance that one teams wins. For example, if the game was flipping a coin, then each team would have 50\% odds. A favoured team will have a great number approaching 100\% and an underdog a value approaching 0\%.

We will use a win expectancy value to replace $WL$ from the existing formula. Instead of giving a team all the points indicated in K we will adjust it by how the teams win expectancy W_{e} compares to the actual win/loss result W. A team that has no chance of losing, even if they didn’t show up would have a win expectancy of W_{e}=100\%=1, and a team that has no chance of winning would have W_{e}=0\%=0. A team who wins gets full value of a win W=1, a team that loses no value W=0, and a team that ties half value W=0.5.

We now determine the relative amount of points given to each party in the competition based on how their result finished relative to how they were expected to finish. Two even teams would have a W_{e}= 0.5 and therefore the winner would received W - W_{e}= 1 - 0.5 = 1/2 of the K points and the loser would get W - 0.5 = -1/2 of K. A favoured team with a W_{e}= 0.75 who wins would get W - W_{e}= 1 - 0.75 = 1/4 of K, while a winning underdog with W_{e}= 0.25 would get W - W_{e}= 1- 0.25 = 3/4 of K. A underdog losing inversely only loses 1/4 of the points and a similarly favourite loses 3/4 of the points.
If we want to clean up this formula:


ELO^{after} = ELO^{before} + K * (W - W_{e}).
To determine W_{e} there are a variety of methods. One is to take of sample of the results before applying apply the win expectancy measure. Then, make a reference table of the difference in Elo ratings and the odds the higher rated team won. Then simply reference the chart. You can then iteratively adjust the chart base on the new ratings achieved while using win expectancy until the results stabilize. Alternatively, I make use of the approximate formula


W_{e} = \frac{1}{10^{\frac{-diff}{400}}+1}

where the differential in Elo values is


diff = ELO^{before}_{team_{A}} - ELO^{after}_{team_{B}}.

Home field advantage?

So far we have adjusted for one team being considered a favourite. However, intuitively and statistically we know there is also an advantage for playing a game at home. Be it travel, sleep, timezones, locker-room, or other issues. The account for this we adjust the differential up by 65 points for a team at home and down by the same amount for a road team. As a result


diff_{HA} =  \begin{cases}  diff + 65,& \text{if } location = Home\\  diff - 65,& \text{if } location = Away\\  diff, & \text{otherwise}  \end{cases}

and


W_{e} = \frac{1}{10^{\frac{-diff_{HA}}{400}}+1}.

For some context, the rule of thumb is that 65 Elo points is worth about 2.6 points scored in an NFL game. From this you should be able to extrapolate that every 25 Elo points is worth a single in game point. For example, a differential in 250 points is a theoretical points spread of 10 points.

This adjustment is pulled from the fivethirtyeight.com Q/A.

What is left? Margin of Victory

We still have a final value to account for. This is the amount of points a team wins/loses by, also known as margin of victory. Often when a favourite wins the point differential gets out of hand for more reasons than the competitive differential of the teams. Think top five college football power five conference team versus a barely mid-major conference team. What we want to do is use a multiplier mult to adjust K for the result of the game.

This multiplier will have two parts. The first will apply decreasing returns on the point total as the differential for score gets larger. A well-suited math function is the natural logarithm ln. The second is a multiplier that decreases when Eloof winner is larger than that of loser and increases when the Eloof loser is larger than winner.

For the first part we have the formula

ln(\left|pts_{W}-pts_{L}\right|+1).

A tie game would then be ln(1) = 0 which results in no multiplier. A single field goal difference is ln(3+1) and a single touchdown difference is ln(7+1). Note, we can see the decreasing returns for win point differential by ln(8) = ~2.08, ln(15) = ~2.71, ln(22) = ~3.09, and ln(29) = ~3.37.

For the second, we start with a multiplier of 2.2 and adjust it based on the team’s Elo differential diff before home and away adjustment. The results is \frac{2.2}{2.2+\frac{diff}{1000}}. This multiplier starts at 1 and decreases as the competitor’s Elo values get further apart.

The accumulated multiplier is


mult = (ln(\left|pts_{W}-pts_{L}\right|+1) * (\frac{2.2}{2.2+\frac{diff}{1000}}).
This multiplier is pulled from the fivethirtyeight.com Q/A.

Neutral Example

Sounds like a lot of math.

Here’s a neutral example of two average teams at a neutral site. With one team winning by a single touchdown.

We will have two teams, a winner team_{W} and a loser team_{L}.

Both team begin with an average ELO ELO^{before}_{team_{W}}=1500 and ELO^{before}_{team_{L}}=1500.

As a result we have a differential of diff =ELO^{before}_{team_{W}} - ELO^{after}_{team_{L}} = 1500-1500 = 0.

A neutral site game means diff = diff_{HA} = 0.

The resulting win expectancy W_{e} = \frac{1}{10^{\frac{0}{400}}+1} = \frac{1}{10^{0}+1} = \frac{1}{1+1} = 0.5.

The winning team will then get W - W_{e} = 1 - 0.5 = 1/2 of K and the losing team will get W - W_{e} = 0 - 0.5 = -1/2 of K.

The value of K itself is


\begin{array}{rl}  K * mult = & 20 * (ln(\left|pts_{W}-pts_{L}\right|+1) * (\frac{2.2}{2.2+\frac{diff}{1000}}) \\  = & 20 * (ln(\left|7\right|+1) * (\frac{2.2}{2.2}) = 20 * ln(8) \\  = & ~41.59.  \end{array}
The winner therefore will have
ELO^{after}_{team_{W}} = ELO^{before}_{team_{W}} + 0.5 * 41.59 = 1500+\frac{41.59}{2}
while the loser will have
ELO^{after}_{team_{L}} = ELO^{before}_{team_{L}} + (-0.5) * 41.59 = 1500-\frac{41.59}{2}.

New Season

To account for the personnel turnover of the off-season a teams Elo is regressed towards the average value of 1500 by a third.


ELO^{start}_{curr}= (ELO^{end}_{last}-1500)*\frac{2}{3}+1500

Significant CFL EloValues

Name Elo
Top 0.1% All-Time 1750
Top 1% All-Time 1700
Top 5% All-Time 1650
Average Grey Cup Team 1600
Average Conference Finals Team 1575
Average Conference Semi-Finals Team 1525

Leave a Reply