Wilson Performance Ratings

See also:

There is an alternative, other than a post-season tournament, to selecting the college football national champion using polls. The College Football Performance Rating System assigns ratings to team base solely on the win-loss records between the teams. This system is entirely objective. It was inspired by the rating system used in Chess.

This rating system is not designed to predict the future. It is not designed to rate how good a team is but how impressive their season was based on the past win-loss performance. In particular, the system discards information on margin of victory and home field advantage. A system that is trying to gauge how strong a team is would keep this data.

For each game a team plays it gets a Game Performance Rating. This is equal to the opponent's rating plus 100 if the team won or minus 100 if the team lost. The team's Performance Rating is the average of its Game Performance Ratings. All of the Game Performance Ratings are recalculated each week based on the latest ratings for the opponents. In addition, the ratings average 500.

For example, after the first game of 1995 season, the ratings were:

    Michigan         550
    Virginia         450

There are two additional features to the system: post-season games count double; and wins that lower a rating and losses that raise a rating count one twentieth as much as the other games.

For example, consider Nebraska's 2000 season:

Nebraska                  Oppon.    Game
   Opponent      Result   Rating   Rating   Weight   Product   Effect
--------------   ------   ------   ------   ------   -------   ------
San Jose St.      WIN      735.1    835.1    0.05       41.8     -2.9
Notre Dame        WIN      861.4    961.4    1.00      961.4    +67.7
Iowa              WIN      744.9    844.9    0.05       42.2     -2.4
Missouri          WIN      726.3    826.3    0.05       41.3     -3.4
Iowa St.          WIN      830.6    930.6    1.00      930.6    +36.9
Texas Tech        WIN      763.2    863.2    0.05       43.2     -1.5
Baylor            WIN      681.3    781.3    0.05       39.1     -5.6
Oklahoma          LOSS     980.9    880.9    1.00      880.9    -12.8
Kansas            WIN      746.5    846.5    0.05       42.3     -2.4
Kansas St.        LOSS     882.4    782.4    1.00      782.4   -111.3
Colorado          WIN      755.1    855.1    0.05       42.8     -1.9
Northwestern      WIN      813.6    913.6    2.00     1827.1    +39.7
                                            =====    =======
                                             6.35     5675.0
         Nebraska's rating =  ------ =  893.7
The games with a wieght of 0.05 are wins that, due to the low rating of the opponent, have a negative effect on Nebraska's rating. Losses that have a positive effect would also get a 0.05 weight but Nebraska doesn't have any of those.

Ratings are calculated to five decimal places even though just the whole number ratings are reported. The "Effect" is the wieghted difference between the game rating and Nebraska's rating. The effects sum to zero.

The ratings are calculated by setting everyone's rating to a default value based on division and then calculating the ratings using the above definition over and over again until they settle down. A detailed example is available. The default values used actually has no effect on the final ratings except for teams in those conferences that play no non-conference games.

Ratings for teams in conferences that play no non-conference games (the New England Small College Athletic Conference (NESCAC) and the Iowa Intercollegiate Athletic Conference (IIAC)) are arbitrary. The average rating for these conferences are set equal to the average rating of other Division III teams.

This system rates the teams' past win-loss performance taking into consideration the win-loss performance of the opponents. For a comparision of this system with Micheal Zenor's Just-Win-Baby rating system, see the critique of Zenor's rating systems. For a rating system that rates the teams' strength and predicts likely future perfomance, see Darryl Marsee's College Football Page. See also Mark Hopkins' comments.

A C language program called rate calculates the ratings. It reads the game results from a file named games.txt, and writes two files with the ratings: one named byname.txt is ordered by team names; the other named byrating.txt is orderd by ratings. This program is copyrighted, but is freely available for non-commercial use.

Question: Why doesn't the Wilson Performance Rating system consider how convincing the outcome was and who had the home field advantage?

Answer: The system is not intended to gauge how strong a team is or predict how likely a team is to win an upcoming game. It is intended to rate the team's performance in a manner as similar as possible to the conference standings published in the newspapers. Traditionally, conference winners are determined solely on the basis of the win-loss record.

Question: Won't counting post-season games double lower the rating of the losing team too much?

Answer: For systems that base each week's ratings on those from the previous week this would happen. In such a system, later games are automatically more important than earlier games. In regular conference standings, all games are given the same weight. In this system, all games are also given the same consideration except that post-season games are entered twice, as if the game had been played twice with the same result. Since there are another eleven games in a season, this does not give a post-season game too much weight, even for the loser. The system determines the ratings by "solving simultaneuous linear equations" with modifications for the rule that wins that lower a rating and loses that raise a rating count one twentieth as much.

Question: In your commentary on the 10/21/95 ratings, you gave an example showing how Wisconsin's rating was calculated:

                                  Opp.    Win/
              Score              Rating   Lose
   ---------------------------    ----    ----
   Colorado  43   Wisconsin  7     777    -100   = 677
   Stanford  24   Wisconsin 24     634       0   = 634
   Wisconsin 42   SMU        0     417    +100   = 517*
   Wisconsin 17   Penn St    9     664    +100   = 764
   Ohio St   27   Wisconsin 16     826    -100   = 726*
   N'Western 35   Wisconsin  0     733    -100   = 633

"The win over SMU doesn't count because the system excludes wins that would lower a rating. The lose to Ohio State doesn't count because the system excludes loses that would raise a rating. Averaging the other four gives Wisconsin its 677 rating." [In 1995, wins that lowered a rating and losses that raised a rating were ignored; now they have a weight of one twentieth.]

As I understand your system, everyone starts out with 500. If that's so, how did Wisconsin jump to 677 from 500 with their LOSS to Colorado in their first game? I would have assumed that the scoring went as follows: I would have Wisconsin's rating after the Colorado game still at 500, according to the system rules. Then, with the Stanford tie and PSU win, I would have Wisconsin's PR as (500 + 634 + 764)/3 = 632.67, going in to the OSU game. The OSU game gets omitted because OSU is rated more than 100 points above Wisconsin, so Wisconsin goes in to the Northwestern game with a PR = 633, which is coincidentally equal to the game rating obtained in the Northwestern loss (633).

Answer: The games are not processed chronologically. All games are treated the same regrardless of the order in which they were played. The process is more like calculating the rating using all the games (659) and then looking for the worst case where the rating benefits from a loss or suffers from a win. Dropping the SMU game, we recalculate the rating (686) and look again for the worst case where the rating benefits from a loss or suffers from a win. Dropping the OSU game, we recalculate the rating (677) and look again. This time we don't find a game that needs to be dropped, so 677 is the rating.

Until 12/8/98, this system ignored wins that would lower a rating and loses that would raise a rating. However, there were far fewer upsets in Division III than in the other divisions. Thus, the Division III ratings spread out while the other divisions were more compressed. As a result, in 1997, the best Division III team (Mount Union) finished 14th. In the 1998 season through the games of 12/5/98, Mount Union was ranked second between Tennessee and Florida State. I don't really think that Mount Union belonged in the national championship game. This problem was solved by including a slight weighting for the difficulty of the entire schedule of games. Wins that lower a rating and losses that raise a rating were changed to count one hundredth as much as the other games. This had little effect on the ranking of the upper division teams, but it did move Mount Union from #2 to a much more realistic #123.

In the 2000 season, Mount Union struck again. They ended up #64, ahead of such teams as Illinois, Kansas, Iowa and Alabama. I changed the weight for wins that lower a rating and losses that raise a rating to one twentieth as much as the other games. This dropped Mounth Union to #88.

An upset (as measured by the teams' previous ratings) can have a big effect on ratings, especially early in the season. For example, if team A beats team B, team C beats teams D, team E beats team F, and team G beats team H at the season start, the ratings are:

        A  550
        C  550
        E  550
        G  550
        B  450
        D  450
        F  450
        H  450

If team B then beats team C, team D beats team E, and team F beats team G, the ratings are:

        A  850
        B  750
        C  650
        D  550
        E  450
        F  350
        G  250
        H  150

If team H then beats team A (a big upset!), the ratings become:

        A  500
        B  500
        C  500
        D  500
        E  500
        F  500
        G  500
        H  500

Question: In the explanation of how upsets affect the system, you discuss an example of A beating B, C beating D, etc. At that point B (with a rating of 450) defeats C (who was rated at 550) with a resulting rating for B of 750. I'm not sure how this occurred, since (as I understand your system), a team is only assessed a "game score" no more that 100 points above, nor 100 points below, the rating of its opponent in that game. With the opponent in this case, C, rated at 550, how can a victory over C raise B 300 points? As well, I thought that a team's rating was actually an average of the assigned or earned points after each of it's games. If so, then, even if 750 was correct in terms of points "earned" by the victory over C, then shouldn't B's rating be 600 => (450 + 750)/2?

Answer: Ratings are caculated from scratch each week. The previous week's ratings have no effect on the new ratings though, of course, the games previously played do. After the second set of games we have A has beaten B who has beaten C who has beaten D, etc. B's victory over C raises B's rating to 100 points above C's NEW rating. Here, each team's rating is 100 points above the new rating for the beaten opponent. The ratings are then scaled to get an average rating of 500.

Translation of this page: Russian.

<- Parent Directory

David Wilson / dwilson@cae.wisc.edu