Originally posted by flexmore
entry level ratings are about two things:
1/ finding the true rating of the new player,
- and -
2/ avoiding ratings creep.
rhp does well at both.
Originally posted by gezza
Are you sure it does both well?
I created a spreadsheet and ran some tests.
I compared provisional rating calculations for a new player using:
a/ The current method of averaging opponents' ratings (+400 for a win, -400 for a loss).
b/ The method proposed by gezza, which I understood to be the normal rating calculation formula with higher K factors.
To follow the tests, do the following:
- Fix the players actual strength (not rating)
- Fix the ratings of the first 20 opponents. I tried the following kinds of scenarios:
--- Ratings within 200 points of the player's actual strength, and the average across all opponents equal to the player's actual strength.
--- All opponents 200 points higher or 200 points lower than player's actual strength.
--- Ratings based on player's current rating. This was more difficult to compare because now the opponents' ratings are impacted by the algorithm used.
- Consider the Win Expectancy of each game (strength vs. Opponent's rating), and randomly decide the results based on these odds.
- As a simplification, I ignored the possibility of draws.
- The method currently used on the site only requires a base rating to calculate the opponents' rating adjustments. The method gezza proposed also requires a base rating for the player's rating calculation, we can assume the same base rating of 1200 for a player who has played zero games.
- I repeated test scenarios by re-randomising the game results.
- When the player's actual strength is 1200, gezza's method using the current K factor of 32 tends to be more accurate. However, if you double the K factor, then the situation seems to be reversed.
- When the player's actual strength is 1600, a K factor of 64 isn't even high enough to consistently bring the players rating up to that figure. In fact, it once ended 300 points too low.
- This problem is worse as the player's actual strength goes up. To have a reasonable chance of accurately determining the rating of a 2200 strong player, you probably need a K factor of 128. Even so, one test produced a 20 game rating of 1756 (nearly 450 points too low).
- I averaged the 20 game rating of a 2200 strong player over 20 tests.
--- Gezza's method with K factor of 128 produced an average result of 2085. - Showing the tendency to produce too low a rating.
--- The current method used by the site produced an average of 2212. - Showing the general consistency.
- Using the average rating method, no matter what the player's actual strength, the final rating after twenty games would always end up 'close'.
- If a very strong (2200) new player plays their first twenty games against opponents with roughly the same rating (you would have to calculate the first 5 ratings yourself), it is quite possible to get into the correct range (+-100) within the first 5 games using the average method. However, using gezza's method, this is only possible with a K factor of 360+
- I even tried staging the K factors in groups of 5. First 5 games 512, then 256, then 128, then 64. Not only was this still dubious in its end result 2058, but the large K factors managed to produce a 2627 rating for a 2200 strong player at game 8. Incidentally the average method was spot on with the same set of results.
- I attempted to break the average rating method by making all opponents' ratings 500 points lower than the player's actual strength. Obviously this means the 20 game rating is capped at 100 points below the player's strength - even with a 100% win rate. In contrast, the other method was able to attain an accurate rating if the K factors were high enough, but it was also possible to over-shoot the mark quite considerably. The method makes it all too easy for a player to artificially inflate their rating by targeting players they would never/hardly ever lose to.
- The only way for gezza's method to reach the correct rating of players with a high actual strength is to push up the K factors. This has the disadvantages of making the method more erratic, and more open to abuse.
- Over repeated tests, gezza's method averages out too low.
- The only problem with the method currently in use is that a win against a low rated player can still negatively affect your rating.