Initial ratings

gezza

Site Ideas

22 Sep 05

gezza

Joined: 07 Jun 05
Moves: 5301

22 Sep 05

Hiya

I think there must be a better way to do the ratings in the first 20 games. It takes a bit of math, but is not that hard.
I am pretty sure FICS has a better system, and see no reason why it couldn't be used here.

What's the point? I play slowly - it will take me a while to get to 20 games. It would be useful to have my rating be a better indicator of performance sometime before Easter, with the goal being to play opponents closer to my own strength.

And planned vaction means that it would be impolite to enter tournaments, so that idea disappears.

Gezza
ps. sorry if this is a frequent topic, search does not seem to work, and I cannot find anyone who has the same suggestion.

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

23 Sep 05

Originally posted by gezza
Hiya

I think there must be a better way to do the ratings in the first 20 games. It takes a bit of math, but is not that hard.
I am pretty sure FICS has a better system, and see no reason why it couldn't be used here.

What's the point? I play slowly - it will take me a while to get to 20 games. It would be useful to have my rating be a better indica ...[text shortened]... quent topic, search does not seem to work, and I cannot find anyone who has the same suggestion.

It works fine.

The system allows for those players with actual strength much higher than 1200 to increase quickly. There is no real way to increase this speed however without opening the door for more abuse (there are already one or two provisional 2200 players here).

flexmore

Quack Quack Quack !

Chesstralia

Joined: 18 Aug 03
Moves: 54533

23 Sep 05

Originally posted by gezza
Hiya

I think there must be a better way to do the ratings in the first 20 games. It takes a bit of math, but is not that hard.
I am pretty sure FICS has a better system, and see no reason why it couldn't be used here.

What's the point? I play slowly - it will take me a while to get to 20 games. It would be useful to have my rating be a better indica ...[text shortened]... quent topic, search does not seem to work, and I cannot find anyone who has the same suggestion.

RATINGS:
entry level ratings are about two things:
1/ finding the true rating of the new player,
- and -
2/ avoiding ratings creep.

rhp does well at both.

HOLIDAYS (and tournaments)
simply make sure all your games have a long enough timebank to cover your holiday, and you will be able to keep playing all the time - with short timeouts to suit your playing speed.

gezza

Joined: 07 Jun 05
Moves: 5301

24 Sep 05

Originally posted by flexmore
RATINGS:
entry level ratings are about two things:
1/ finding the true rating of the new player,
- and -
2/ avoiding ratings creep.

rhp does well at both.

Are you sure it does both well?

If I look at my opponents, one of them has had his rating rise 90 points in six games from when his rating stopped being provisional. Another has only lost 2 games in 21. I think the current system fails to find the true rating of new players with any sucess.

Someone else has said in this forum:
When the <1400 players stop putting out open invites with >1800 rating limits I'll stop accepting and deleting them.

Having a few games accepted and deleted without moving, I can only guess that people filter by apparant rating. I did not have any rating limit on them, but can see no other reason. So getting a more correct rating is important.

The system I spoke of makes the "K" factor in the ratings formula vary, depending on the number of games played - it is very high for the first few games for the provisional player, and very low for the opponent.

craigy

Johannesburg

Joined: 02 May 04
Moves: 13066

27 Sep 05

Originally posted by flexmore
RATINGS:
entry level ratings are about two things:
1/ finding the true rating of the new player,
- and -
2/ avoiding ratings creep.

rhp does well at both.

Originally posted by gezza
Are you sure it does both well?

Text shortened

Interesting question...
I created a spreadsheet and ran some tests.
I compared provisional rating calculations for a new player using:
a/ The current method of averaging opponents' ratings (+400 for a win, -400 for a loss).
b/ The method proposed by gezza, which I understood to be the normal rating calculation formula with higher K factors.

To follow the tests, do the following:
- Fix the players actual strength (not rating)
- Fix the ratings of the first 20 opponents. I tried the following kinds of scenarios:
--- Ratings within 200 points of the player's actual strength, and the average across all opponents equal to the player's actual strength.
--- All opponents 200 points higher or 200 points lower than player's actual strength.
--- Ratings based on player's current rating. This was more difficult to compare because now the opponents' ratings are impacted by the algorithm used.
- Consider the Win Expectancy of each game (strength vs. Opponent's rating), and randomly decide the results based on these odds.
- As a simplification, I ignored the possibility of draws.
- The method currently used on the site only requires a base rating to calculate the opponents' rating adjustments. The method gezza proposed also requires a base rating for the player's rating calculation, we can assume the same base rating of 1200 for a player who has played zero games.

- I repeated test scenarios by re-randomising the game results.

Notes:
- When the player's actual strength is 1200, gezza's method using the current K factor of 32 tends to be more accurate. However, if you double the K factor, then the situation seems to be reversed.
- When the player's actual strength is 1600, a K factor of 64 isn't even high enough to consistently bring the players rating up to that figure. In fact, it once ended 300 points too low.
- This problem is worse as the player's actual strength goes up. To have a reasonable chance of accurately determining the rating of a 2200 strong player, you probably need a K factor of 128. Even so, one test produced a 20 game rating of 1756 (nearly 450 points too low).
- I averaged the 20 game rating of a 2200 strong player over 20 tests.
--- Gezza's method with K factor of 128 produced an average result of 2085. - Showing the tendency to produce too low a rating.
--- The current method used by the site produced an average of 2212. - Showing the general consistency.
- Using the average rating method, no matter what the player's actual strength, the final rating after twenty games would always end up 'close'.
- If a very strong (2200) new player plays their first twenty games against opponents with roughly the same rating (you would have to calculate the first 5 ratings yourself), it is quite possible to get into the correct range (+-100) within the first 5 games using the average method. However, using gezza's method, this is only possible with a K factor of 360+
- I even tried staging the K factors in groups of 5. First 5 games 512, then 256, then 128, then 64. Not only was this still dubious in its end result 2058, but the large K factors managed to produce a 2627 rating for a 2200 strong player at game 8. Incidentally the average method was spot on with the same set of results.
- I attempted to break the average rating method by making all opponents' ratings 500 points lower than the player's actual strength. Obviously this means the 20 game rating is capped at 100 points below the player's strength - even with a 100% win rate. In contrast, the other method was able to attain an accurate rating if the K factors were high enough, but it was also possible to over-shoot the mark quite considerably. The method makes it all too easy for a player to artificially inflate their rating by targeting players they would never/hardly ever lose to.

Conclusions:
- The only way for gezza's method to reach the correct rating of players with a high actual strength is to push up the K factors. This has the disadvantages of making the method more erratic, and more open to abuse.
- Over repeated tests, gezza's method averages out too low.
- The only problem with the method currently in use is that a win against a low rated player can still negatively affect your rating.

craigy

Johannesburg

Joined: 02 May 04
Moves: 13066

27 Sep 05

Originally posted by gezza
...

If I look at my opponents, one of them has had his rating rise 90 points in six games from when his rating stopped being provisional. Another has only lost 2 games in 21. I think the current system fails to find the true rating of new players with any sucess.
...

I wouldn't even raise an eyebrow if a player with a well established rating (100+) games did the same. Ratings are only an estimate of actual strength and I estimate they're usually only accurate to within a 100 points. Remember, a player 200 points weaker than you still has a decent 25% chance of winning.
A player need only achieve 6 wins in a row against similarly rated opponents to achieve the feat you mention. This happens easily with any combination of: he is underrated, his opponents are overrated, lady luck smiles on him, he learnt a lot over the last couple of months it took to play the games.

As for the player who lost only two games in 21, take a look at the top 100 players, you'll see a few who have yet to lose a game. If your opponent played the majority of his games against opponents rated 300 or more points below his strength, then the result is to be expected. Since he's already lost a couple of games, the upper boundary of his current strength has been tested and his rating is probably within a hundred points of accurate; so there's no need to worry about losing to an underrated opponent - which wouldn't do your rating any good.

If a system can consistently and quickly bring new players' ratings to within 200 points of accurate - then its good. I think this system does that comfortably. If you want to get your rating fairly accurate as quickly as possible, do the following:
- try estimate where your rating belongs. analyse the public games of various players if necessary.
- play against opponents in that range, you should win about half those games, and your rating will move there quickly.
- if you win more/less games, your rating will be higher/lower as appropriate.
- challenge opponents with established ratings in your first 20 games. Provisionally rated players have more erratic ratings, and the contribution towards your rating is reduced. The established players' ratings are not too negatively affected because their ratings are adjusted by a reduced K factor.
- if you are unable to estimate your rating, then challenge 3 opponents at a time rated +-100 points above you. Wins pull your rating up faster, losses indicate you're nearing the correct range, and losing all three games probably means you were slightly overrated, as well as bring your rating down.

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

28 Sep 05

Originally posted by gezza
Are you sure it does both well?

If I look at my opponents, one of them has had his rating rise 90 points in six games from when his rating stopped being provisional. Another has only lost 2 games in 21. I think the current system fails to find the true rating of new players with any sucess.

Someone else has said in this forum:
[b]When the <1400 player ...[text shortened]... is very high for the first few games for the provisional player, and very low for the opponent.

Because this is a correspondance site and games are played simultaneously spikes are always going to present.

Say I join a large group tournament, say with group size of 15. That's 28 games all starting at the same time. Now I win some quite quickly due to opponent's blundering early on. My rating increases. Then time progresses my lost games start to finish up and my rating drops back to near it's previous level. I now have a spike. Not because my level of play changed but due to the order in which games finished.

Also thanks for quoting me, next time feel free to attribute it to me. However I don't see what my open invite comment has to do with anything. I accept and delete open invites with those kind of limits on them because they just sit unaccepted cluttering up the list. I don't touch unlimited invites as they are soon accepted (this doesn't count set pieces) by someone who actually wants a game. And anyway that's what the invite system is for.

The provisional system allows a crude approximation of a player's true rating (the rating of their actual play) within a few games. This works well if the true rating is within a few hundred points of the 1200 start point. If it isn't, let's say it's 2100, then it takes more than a few games. My rating was still increasing well after my provisional period finished. This is fine. Ironman's rating was increasing (albeit slowly) right to the very end. That's just the way it works.
You've played 11 games. Try finished 100 and then get back to us.

gezza

Joined: 07 Jun 05
Moves: 5301

29 Sep 05

Originally posted by XanthosNZ
Also thanks for quoting me, next time feel free to attribute it to me. However I don't see what my open invite comment has to do with anything. I accept and delete open invites with those kind of limits on them because they just sit unaccepted cluttering up the list.
Appologies. I searched the list for similar suggestions before posting. I remembered the comment, because having had games be accepted and then closed, it stuck in my head. I went back to find the line to copy-paste, but did not go back for the author.

What is has to do with anything is that ratings matter. As a low rated player, I cannot expect to get games against people more than a hunderd points above my current RHP rating. Open invites from highly rated players vanish pretty quick.

Originally posted by XanthosNZ
My rating was still increasing well after my provisional period finished. This is fine. [...] You've played 11 games. Try finished 100 and then get back to us.

11 games in 3 months. 100 games in 2 1/2 years. Perhaps not. It may be fine for you that the provisional system could do better, but this area is about suggestions to improve the site, and IMNSHO it would be a useful benefit for new players..

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

30 Sep 05

Originally posted by gezza
Originally posted by XanthosNZ
[b]Also thanks for quoting me, next time feel free to attribute it to me. However I don't see what my open invite comment has to do with anything. I accept and delete open invites with those kind of limits on them because they just sit unaccepted cluttering up the list.
Appologies. I searched the list for similar s ...[text shortened]... about suggestions to improve the site, and IMNSHO it would be a useful benefit for new players..[/b]

Put forward a fully fleshed out system and I'll tear it to pieces.

gezza

Joined: 07 Jun 05
Moves: 5301

06 Dec 05

Originally posted by XanthosNZ
Put forward a fully fleshed out system and I'll tear it to pieces.

Are you always this positive and helpful Xanthos?

gezza

Joined: 07 Jun 05
Moves: 5301

06 Dec 05

Originally posted by craigy
Interesting question...
I created a spreadsheet and ran some tests.
I compared provisional rating calculations for a new player using:

Craigy

Thanks for looking at this. The idea is that the K factor for a newbee decays from something very high to the normal value as the number of games increases towards 20. There should not be steps in the changes.

The system is in use on FICS and works there, so might work here. The only difference is that there one game completes before the next starts - this may or may not have an effect on whether the system works.

The logic is simply that at the start your rating is totally unknown, so anything is a guess. But once you have a first result, you may as well use it.

There is a further point that for your oppent, the K factor varies depending on the number of games you play. This is because the opponent's rating should not change much by beating someone of unknown strength - this just leads to grade inflation.

I'll put together my own spreadsheet, and get back to you. I think the "--- Opponents Ratings based on player's current rating." scenario is the most interesting, as I think this reflects how people accept games and challanges - certainly you cannot enter a tourney based on your actual strenth - only apparent rating.

You saw some overshoot in rating - this may happen, but should happen most in the first few games, and the current system rates people at 1200 for 6 games anyway, so there may need to be some indication that the rating could be way out.

I think the effect on opponent's rating is also worth looking at. At the moment there is a strong disincentive for highly rated players to accept a challange from a provisional player - they stand to lose quite a few rating points if they lose to someone who might actually be stronger. I see this as a disadvantage because it affects the strength of the next opponent of the rated player. By making the K for a newbee's opponent very low, there is a lot less at stake when playing a newbee, and a lot less to gain - just the fun of playing, and helping someone else find out where they actually stand.

I now have some time to work on this, so I'll post something more concrete in a week or so, and then I would apreciate some more feedback, if you have time.

XanthosNZ

Cancerous Bus Crash

p^2.sin(phi)

Joined: 06 Sep 04
Moves: 25076

06 Dec 05

Originally posted by gezza
At the moment there is a strong disincentive for highly rated players to accept a challange from a provisional player - they stand to lose quite a few rating points if they lose to someone who might actually be stronger.

The most a non-provisional player could lose in a game against a provisional player is 16 points (with a difference of 600+ points). That's not really that much. The main reason I won't waste my time playing provisional players is for the most part they seem to be the worst offenders in terms of bad manners; sending challenges with no message or comment just a game appearing in your inbox; some stop playing when losing or just disappear (especially non-sub provisionals).

Ragnorak

For RHP addons...

tinyurl.com/yssp6g

Joined: 16 Mar 04
Moves: 15013

06 Dec 05

Originally posted by gezza
Hiya

I think there must be a better way to do the ratings in the first 20 games. It takes a bit of math, but is not that hard.
I am pretty sure FICS has a better system, and see no reason why it couldn't be used here.

What's the point? I play slowly - it will take me a while to get to 20 games. It would be useful to have my rating be a better indica ...[text shortened]... quent topic, search does not seem to work, and I cannot find anyone who has the same suggestion.

Who cares? It is about having good games.

D

gezza

Joined: 07 Jun 05
Moves: 5301

06 Dec 05

Originally posted by Ragnorak
Who cares? It is about having good games.

D

Indeed. Glad you read my profile. However, if the ratings are way out, then it is not such a good game:Game 1245881, and not so much fun for either player - ratings at the time of the game 1224 vs 1200.

I am sure that if I play someone 500 points stronger than me, I will make mistakes which are just as obvious to them.

Gezza

Ragnorak

For RHP addons...

tinyurl.com/yssp6g

Joined: 16 Mar 04
Moves: 15013

06 Dec 05

Originally posted by gezza
Indeed. Glad you read my profile. However, if the ratings are way out, then it is not such a good game:Game 1245881, and not so much fun for either player - ratings at the time of the game 1224 vs 1200.

I am sure that if I play someone 500 points stronger than me, I will make mistakes which are just as obvious to them.

Gezza

Why don't you stop playing random games, and enter a couple of tournaments or sieges? That way you'll get games against all standards of player.

D