Originally posted by Dragon FireIn the past two years, I've taken 2nd place in more than half of the club events I've entered (except for weekend events that draw from out of town), and I've won a few of the rest. Counting myself as #6 includes a lot on relatively inactive players, making the club size three dozen. Most meetings have 12-20 attendees. 280 miles away in the largest city in the region, there are several titled players.
As most clubs around here are doing well if they have a dozen members hald a dozen 1700s or above would give a mean around 1700. Of course we only have 100000 people or so so getting more than 10 OTB chess players is quite good.
Originally posted by petrovitchWish I'd payed more attention in those maths classes now! When you say there is no correlation between a players rating and the number of games played, it makes me wonder! I think that all along I have thought that players with more games played on RHP, perhaps had a more "accurate" rating, as the greater number of games would increase the possibility of a good average. I apologize if my basic understanding of "average" is missing the boat. So - can you bust this myth (ie that the more games a player has played, the more likely it is that that player's rating is accurate for the purposes of comparison with other players of the same rating)?
I stated there was no correlation between a player's rating and the number of games played. I guess I'd better back that up with evidence. Remember, these figures are real-time so while the mean, standard deviation, etc. should remain very close they may vary from one examination to another. So this mean may be slightly different from previous means exami ...[text shortened]... layed tells us absolutely nothing about a player's rating or the acceleration of his rating.
I think I even play with more motivation (read "fear"😉 against a 1700 with 500 games, than an 1800 with 30. I assume the 1800 with fewer games is on a very fast rise, and therefore will be ranked way higher soon. I know the rating should not matter, but it does - seeing that 2000+ I start thinking "maybe I can draw!"
Originally posted by ExumaThere must be some sort of correlation in so far as the larger the sample (number of games played) the more accurate the rating is likely to be. This would be especially true with relatively few games played and in fact this is recognised in the 1st 20 games. However once you have reached a critical mass which could be as low as a hundred games then your rating should stabilise and would be just as likely to go down as up so perhaps then there is no corelation.
Wish I'd payed more attention in those maths classes now! When you say there is no correlation between a players rating and the number of games played, it makes me wonder! I think that all along I have thought that players with more games played on RHP, perhaps had a more "accurate" rating, as the greater number of games would increase the possibility of a g ...[text shortened]... hould not matter, but it does - seeing that 2000+ I start thinking "maybe I can draw!"
With new players always coming in at 1200 I would have expected the mean to be 1200 so perhaps some statistics expert can tell me why it does not.
Originally posted by ExumaYou're totally right on this.Given, of course, that the said player doesn't end up playing the same people all over again the more games the more near his rating is close to his real rating.
I think that all along I have thought that players with more games played on RHP, perhaps had a more "accurate" rating, as the greater number of games would increase the possibility of a good average.
But what petrovitch means is that you can't say that a player with more games will have a higher rating (positive correlation), or that a player with more games will have a lower rating (negative correlation).
Originally posted by Dragon FireAs the size of the sample increases the size of the error decreases. Yes, this is a rule. If you want to determine the accuracy of your OTB rating (FIDE, USCF, etc.) using your RHP history that is easy to do. The problem is that we would have to gather your individual RHP game history.
There must be some sort of correlation in so far as the larger the sample (number of games played) the more accurate the rating is likely to be. This would be especially true with relatively few games played and in fact this is recognised in the 1st 20 games. However once you have reached a critical mass which could be as low as a hundred games then your ...[text shortened]... have expected the mean to be 1200 so perhaps some statistics expert can tell me why it does not.
t = ((FIDE Rating) - (RHP Rating)) / (Standard Deviation of RHP Games) / Square Root of the Number of RHP Games
then we could construct a Confidence Interval that would give you a range or ratings (instead of a point estimate) of your FIDE Rating to see if they compare.
For example, let's say you want to be 95% certain of the accuracy of your RHP rating compared to your FIDE rating. We could construct
95% C.I. of the Mean = (Lower Rating, Upper Rating)
where
Lower Rating = rhp - t * std / sqrt(n)
Upper Rating = rhp + t * std / sqrt(n)
That would mean that we can be 95% certain that your FIDE Rating should be within the range of (Lower Rating, Upper Rating) given your current FIDE Rating, current RHP Rating, and the standard deviation of your RHP Rating.
Originally posted by petrovitchI think it would be much simpler if we had a list of players FIDE ratings and compared them with RHP ratings. Using a simple regression you could determine the correlation between these two variables. This would give you direction, either positive or negative. The square the coefficient of correlation to determine the degree of their relationship. This is called the coefficient of determination, or r2. The slope would determine, for example, how much more or less teh RHP rating is in relation to the FIDE rating. In other words, if the slope was 1.2 then you would expect a player's RHP rating to advance 20% faster than his FIDE rating. This would indicate an inflation in the RHP scores.
As the size of the sample increases the size of the error decreases. Yes, this is a rule. If you want to determine the accuracy of your OTB rating (FIDE, USCF, etc.) using your RHP history that is easy to do. The problem is that we would have to gather your individual RHP game history.
t = ((FIDE Rating) - (RHP Rating)) / (Standard Deviation of RHP ...[text shortened]... en your current FIDE Rating, current RHP Rating, and the standard deviation of your RHP Rating.
This is all speculation since I don't have any scores to compare. With 30 player's scores from (USCF, FIDE, etc) and RHP we could get an some idea if RHP scores are inflated, deflated, or unrelated.
Get me the scores and I'll be glad to get you the results.
The Elo Rating system was designed to rank chess players and allow a statistical estimate based on probabilities as to the outcome of pairing chess players.
What I would like to see is a performance rating. How did I play this game. Our ratings may be the same or 400 points apart, but still we should be able to evaluate how well each player played the game.
Let's use the following game as an example. I played it the day before yesterday on a French FICS server. The game was blitz, so it is not one of my best games, but I think it will be good for this exercise.
[Event "ICS unrated blitz match"]
[Site "jeu.echecs.com"]
[White "guest"]
[Black "me"]
1. d4 Nf6 2. e3 d6 3. a3 a6 4. c4 c5 5. d5 b5 6. cxb5 g6 7. b4 Bg7 8. Bb2 O-O 9. bxa6 Bxa6 10. b5 Bb7 11. a4 Bxd5 12. Nf3 Nbd7 13. Nc3 Bb7 14. Be2 Ne4 15. Qc2 Nxc3 16. Bxc3 Nf6 17. O-O Be4 18. Qb2 e5 19. Nd2 Bb7 20. f3 Nd5 21. e4 Nf4 22. Bc4 Qg5 23. g3 Ne6 24. Kf2 Nd4 25. h4 Qf6 26. Kg2 Bh6 27. Rfe1 Bxd2 28. Bxd2 Qxf3+ 29. Kh3 Bc8+ 30. Kh2 Qf2+ 31. Kh1 Qxg3 32. Bg5 Bg4 33. Be2 Nf3 34. Bxf3 Bxf3+ 35. Qg2 Qxg2# 0-1
In post-mortem analysis I want to compare my move selection with those of Crafty. If the move I choose is 1st on the list of moves ranked best by Crafty then I would get 1 point. If my move was ranked 25th on Crafty's list then I would get 25 points. Rank each move according to Crafty, add the points, then divide the checksum by the number of moves in the game. In this case my checksum was 122. So I divided 122 by 35 and get 3.48
I would say, then, that my MEAN RANK = 3.48
On a point system does this mean that I play 3.48 points or almost a bishop and a pawn less efficient than Crafty? I'm not sure, but the mean rank is at least a starting point at coming up with a performance rating. This could also be called a handicap.
Some allowance may need to be made for the opening. I scored 9, 22 and 23 on moves in the opening that were not that bad. The rest of the game my scores were pretty consistant with Crafty. For example, after 1. d4 Nf6 Crafty wouldn't consider 2. c4 very high. I think it was about 25th on Crafty's list. So consider this in this discussion.
Other interesting scores, would be that 19 of my moves were 1st on Crafty's list giving me a score of 0.5429 And 29 of my moves were in Crafty's top 5 moves give me a score of 0.8257
So 83% of my moves were of the crem d la creme on Crafty's list of ranked moves in a 3 0 Blitz game where there is no time for calculating.
Of course, one bad move and the game may be lost. So whether these figures mean anything or not, only time will tell.
That's not the key to performance ...
Rating = 800
n = 33
sum = 78
mean = 2.36
Matching 1st Move = 0.4848
Top 5 Moves = 0.8787
But he missed two mates and had a 30 and a 36. 🙁 I've never really examined lower rated players games, but while doing this little experiment I noticed 5 very important things that need to be pointed out to players with ratings less than 1400.
1. The pawn structures were completely destroyed in these lower rated players games. I think they are just moving pawns because they don't know what to do. Also, they try to scare off the enemy with pawn moves that ultimately destroy their own position. These pawn moves are also made at the expense of not developing their pieces. And I don't mean not developing them quickly; I mean not developing them at all! There are still minor pieces on their original squares 20 moves into the game.
😉 Wasn't it Philidor who said, "Pawns are the soul of chess."
2. As mentioned above lack of piece development.
3. Most of the time the queens are never moved in their games. They don't know how to use their queen so they don't user her at all.
4. They react to threats immediately without preparation. While most of their moves are pretty good, they make two or three very bad moves that cost them material and the game.
5. They overlook mating combinations -- not just mate in 3, but many times they don't see mate in 1. This just takes practice. It doesn't take any brains to see mate in 1.
_____________________________________________
Rating = 2100
n = 57
sum = 131
mean = 2.29
Matching 1st Move = 0.7192
Top 5 Moves = 0.8947
By the way, both of these players lost. The mean rank is not that much different. The higher rated player scored remarkedly higher on matching 1st moves with Crafty, but only slightly higher on matching the top 5 moves.
The higher rated player did not miss any mate threats, and did not blunder. His worst two moves were 16 and 15.
So everything is not lost. I think we've learned quite a bit with this little experiment.
_____________________________________________
* I realize I'm talking with myself, but I think these ideas need to be discussed. 🙂
thats an interesting way to try to rate a players performance, but I think you have your intepretation of MEAN RANK incorrect.
You speculate having a MEAN RANK of 3.5 means that you played a piece and half a pawn worse than Crafty, but even if you completely matched Crafty your MEAN RANK would still be 1, and there is no way that means you played a pawn worse.
Perhaps a better thing to do would be to sum the difference in evaluations by Crafty, rather than just rank ordering the moves. In this way a blunder will be more heavily punished.
So if Crafty's best move is evaluated at +0.3 and you play a move evaluated at +0.1 your score for that move is 0.2
To punish blunders even more severely (as they normally cost you the game) and reward consistent play we can take the sum of squares divided by the number of moves as your "score" for that game.
I would provide an example from one of my games but I don't have an engine to do the analyses for me.
Originally posted by droflaceI never thought the mean rank would be equivalent to the point count. I just meant that we may not be able to predict what type of correlation may come from it. It's just a beginning. I was disappointed that the 800 and 2100 player had similar scores and the games were so entirely differnt. I had hoped the mean rank would be meaningful.
thats an interesting way to try to rate a players performance, but I think you have your intepretation of MEAN RANK incorrect.
You speculate having a MEAN RANK of 3.5 means that you played a piece and half a pawn worse than Crafty, but even if you completely matched Crafty your MEAN RANK would still be 1, and there is no way that means you played a pawn ...[text shortened]... rovide an example from one of my games but I don't have an engine to do the analyses for me.
I thought the low rank of the bad moves would suffice. Maybe we need to square the scores and take the square root of the final sum. That would exagerate the weakness similar to the least squares method.
Have to agree with droflace that it would make much more sense to take the evaluation difference rather than the rank. Could be that all of the top 10 moves are within 0.1 point of each other, or it could be that the 10th is 10 points worse than the 1st. The evaluation difference is a better indicator of how inaccurate the move was.
Originally posted by adam warlockI suppose really we are talking about some way to address the quality of the games played. A player could have many games played, with a large number of timeouts. Or a player who only plays those ranked much lower. The former resulting in an unfairly low rating, and the latter an unfairly high ranking. Though I do realize that if Weyerstrass plays Kik, there will be no ranking change if it goes as one would expect...
You're totally right on this.Given, of course, that the said player doesn't end up playing the same people all over again the more games the more near his rating is close to his real rating.
But what petrovitch means is that you can't say that a player with more games will have a higher rating (positive correlation), or that a player with more games will have a lower rating (negative correlation).
Originally posted by Dragon FireThese statistics are derived from the "player tables" page, which includes only those players who have moved in the last 100 days. I would guess that there are vast numbers of RHP users who have joined, lost a few games, lost interest and vanished. This might be enough to push up the mean rating of the active players.
With new players always coming in at 1200 I would have expected the mean to be 1200 so perhaps some statistics expert can tell me why it does not.
Interesting thread, thanks everyone.
Originally posted by incandenzaOkay, I replaced the mean rank with the differences between the evaluation of the move chosen and the move suggested (in the sample blitz game). I go the sum then divided by the number of moves to get a mean score of 0.63
Have to agree with droflace that it would make much more sense to take the evaluation difference rather than the rank. Could be that all of the top 10 moves are within 0.1 point of each other, or it could be that the 10th is 10 points worse than the 1st. The evaluation difference is a better indicator of how inaccurate the move was.
Aren't we simply computing a standard deviation between the move suggested and the move selected? Or maybe our performance rating is more like a z-score. It is the distance from the mean. The only thing, though, is that we could not have a positive score since there is no way to measure how much better we are than Crafty. So it would be a one tail test ...
n = 34
sum = 21.44
mean difference = 0.630588235294118
std = 1.19722458745332
Chi Square measure the difference between observed and expected. ANOVA measures the difference between two vectors.
Originally posted by John of ReadingThe other reason that the mean is not equal to 1200 is that the rating system is not "closed" for provisional players (ie. the points won/lost by the provisional player are not equal to teh points lost/won by the other player).
These statistics are derived from the "player tables" page, which includes only those players who have moved in the last 100 days. I would guess that there are vast numbers of RHP users who have joined, lost a few games, lost interest and vanished. This might be enough to push up the mean rating of the active players.
Interesting thread, thanks everyone.
In games between two non-provisional players the rating system is closed. What this means is that you would expect the overall mean rating to be close to the mean rating of provisional players at the point in which they become non-provisional players (after 20 games)