RHP Site Player Statistics

petrovitch

Only Chess

11 May 08 03:43

Wulebgr

Angler

River City

Joined: 08 Dec 04
Moves: 16907

12 May 08 21:52

Originally posted by Dragon Fire
As most clubs around here are doing well if they have a dozen members hald a dozen 1700s or above would give a mean around 1700. Of course we only have 100000 people or so so getting more than 10 OTB chess players is quite good.

In the past two years, I've taken 2nd place in more than half of the club events I've entered (except for weekend events that draw from out of town), and I've won a few of the rest. Counting myself as #6 includes a lot on relatively inactive players, making the club size three dozen. Most meetings have 12-20 attendees. 280 miles away in the largest city in the region, there are several titled players.

Exuma

Anansi

Woodshed

Joined: 16 Apr 07
Moves: 35523

14 May 08 02:20

Originally posted by petrovitch
I stated there was no correlation between a player's rating and the number of games played. I guess I'd better back that up with evidence. Remember, these figures are real-time so while the mean, standard deviation, etc. should remain very close they may vary from one examination to another. So this mean may be slightly different from previous means exami ...[text shortened]... layed tells us absolutely nothing about a player's rating or the acceleration of his rating.

Wish I'd payed more attention in those maths classes now! When you say there is no correlation between a players rating and the number of games played, it makes me wonder! I think that all along I have thought that players with more games played on RHP, perhaps had a more "accurate" rating, as the greater number of games would increase the possibility of a good average. I apologize if my basic understanding of "average" is missing the boat. So - can you bust this myth (ie that the more games a player has played, the more likely it is that that player's rating is accurate for the purposes of comparison with other players of the same rating)?

I think I even play with more motivation (read "fear"😉 against a 1700 with 500 games, than an 1800 with 30. I assume the 1800 with fewer games is on a very fast rise, and therefore will be ranked way higher soon. I know the rating should not matter, but it does - seeing that 2000+ I start thinking "maybe I can draw!"

Dragon Fire

Lord of all beasts

searching for truth

Joined: 06 Jun 06
Moves: 30390

14 May 08 07:23

Originally posted by Exuma
Wish I'd payed more attention in those maths classes now! When you say there is no correlation between a players rating and the number of games played, it makes me wonder! I think that all along I have thought that players with more games played on RHP, perhaps had a more "accurate" rating, as the greater number of games would increase the possibility of a g ...[text shortened]... hould not matter, but it does - seeing that 2000+ I start thinking "maybe I can draw!"

There must be some sort of correlation in so far as the larger the sample (number of games played) the more accurate the rating is likely to be. This would be especially true with relatively few games played and in fact this is recognised in the 1st 20 games. However once you have reached a critical mass which could be as low as a hundred games then your rating should stabilise and would be just as likely to go down as up so perhaps then there is no corelation.

With new players always coming in at 1200 I would have expected the mean to be 1200 so perhaps some statistics expert can tell me why it does not.

adam warlock

Baby Gauss

Ceres

Joined: 14 Oct 06
Moves: 18375

14 May 08 08:46

Originally posted by Exuma
I think that all along I have thought that players with more games played on RHP, perhaps had a more "accurate" rating, as the greater number of games would increase the possibility of a good average.

You're totally right on this.Given, of course, that the said player doesn't end up playing the same people all over again the more games the more near his rating is close to his real rating.

But what petrovitch means is that you can't say that a player with more games will have a higher rating (positive correlation), or that a player with more games will have a lower rating (negative correlation).

petrovitch

Joined: 08 May 07
Moves: 55475

14 May 08 16:14

Originally posted by Dragon Fire
There must be some sort of correlation in so far as the larger the sample (number of games played) the more accurate the rating is likely to be. This would be especially true with relatively few games played and in fact this is recognised in the 1st 20 games. However once you have reached a critical mass which could be as low as a hundred games then your ...[text shortened]... have expected the mean to be 1200 so perhaps some statistics expert can tell me why it does not.

As the size of the sample increases the size of the error decreases. Yes, this is a rule. If you want to determine the accuracy of your OTB rating (FIDE, USCF, etc.) using your RHP history that is easy to do. The problem is that we would have to gather your individual RHP game history.

t = ((FIDE Rating) - (RHP Rating)) / (Standard Deviation of RHP Games) / Square Root of the Number of RHP Games

then we could construct a Confidence Interval that would give you a range or ratings (instead of a point estimate) of your FIDE Rating to see if they compare.

For example, let's say you want to be 95% certain of the accuracy of your RHP rating compared to your FIDE rating. We could construct

95% C.I. of the Mean = (Lower Rating, Upper Rating)

where

Lower Rating = rhp - t * std / sqrt(n)

Upper Rating = rhp + t * std / sqrt(n)

That would mean that we can be 95% certain that your FIDE Rating should be within the range of (Lower Rating, Upper Rating) given your current FIDE Rating, current RHP Rating, and the standard deviation of your RHP Rating.

petrovitch

Joined: 08 May 07
Moves: 55475

14 May 08 16:15

Originally posted by petrovitch
As the size of the sample increases the size of the error decreases. Yes, this is a rule. If you want to determine the accuracy of your OTB rating (FIDE, USCF, etc.) using your RHP history that is easy to do. The problem is that we would have to gather your individual RHP game history.

t = ((FIDE Rating) - (RHP Rating)) / (Standard Deviation of RHP ...[text shortened]... en your current FIDE Rating, current RHP Rating, and the standard deviation of your RHP Rating.

I think it would be much simpler if we had a list of players FIDE ratings and compared them with RHP ratings. Using a simple regression you could determine the correlation between these two variables. This would give you direction, either positive or negative. The square the coefficient of correlation to determine the degree of their relationship. This is called the coefficient of determination, or r2. The slope would determine, for example, how much more or less teh RHP rating is in relation to the FIDE rating. In other words, if the slope was 1.2 then you would expect a player's RHP rating to advance 20% faster than his FIDE rating. This would indicate an inflation in the RHP scores.

This is all speculation since I don't have any scores to compare. With 30 player's scores from (USCF, FIDE, etc) and RHP we could get an some idea if RHP scores are inflated, deflated, or unrelated.

Get me the scores and I'll be glad to get you the results.

petrovitch

Joined: 08 May 07
Moves: 55475

14 May 08 18:16

2 edits

The Elo Rating system was designed to rank chess players and allow a statistical estimate based on probabilities as to the outcome of pairing chess players.

What I would like to see is a performance rating. How did I play this game. Our ratings may be the same or 400 points apart, but still we should be able to evaluate how well each player played the game.

Let's use the following game as an example. I played it the day before yesterday on a French FICS server. The game was blitz, so it is not one of my best games, but I think it will be good for this exercise.

[Event "ICS unrated blitz match"]
[Site "jeu.echecs.com"]
[White "guest"]
[Black "me"]

1. d4 Nf6 2. e3 d6 3. a3 a6 4. c4 c5 5. d5 b5 6. cxb5 g6 7. b4 Bg7 8. Bb2 O-O 9. bxa6 Bxa6 10. b5 Bb7 11. a4 Bxd5 12. Nf3 Nbd7 13. Nc3 Bb7 14. Be2 Ne4 15. Qc2 Nxc3 16. Bxc3 Nf6 17. O-O Be4 18. Qb2 e5 19. Nd2 Bb7 20. f3 Nd5 21. e4 Nf4 22. Bc4 Qg5 23. g3 Ne6 24. Kf2 Nd4 25. h4 Qf6 26. Kg2 Bh6 27. Rfe1 Bxd2 28. Bxd2 Qxf3+ 29. Kh3 Bc8+ 30. Kh2 Qf2+ 31. Kh1 Qxg3 32. Bg5 Bg4 33. Be2 Nf3 34. Bxf3 Bxf3+ 35. Qg2 Qxg2# 0-1

In post-mortem analysis I want to compare my move selection with those of Crafty. If the move I choose is 1st on the list of moves ranked best by Crafty then I would get 1 point. If my move was ranked 25th on Crafty's list then I would get 25 points. Rank each move according to Crafty, add the points, then divide the checksum by the number of moves in the game. In this case my checksum was 122. So I divided 122 by 35 and get 3.48

I would say, then, that my MEAN RANK = 3.48

On a point system does this mean that I play 3.48 points or almost a bishop and a pawn less efficient than Crafty? I'm not sure, but the mean rank is at least a starting point at coming up with a performance rating. This could also be called a handicap.

Some allowance may need to be made for the opening. I scored 9, 22 and 23 on moves in the opening that were not that bad. The rest of the game my scores were pretty consistant with Crafty. For example, after 1. d4 Nf6 Crafty wouldn't consider 2. c4 very high. I think it was about 25th on Crafty's list. So consider this in this discussion.

Other interesting scores, would be that 19 of my moves were 1st on Crafty's list giving me a score of 0.5429 And 29 of my moves were in Crafty's top 5 moves give me a score of 0.8257

So 83% of my moves were of the crem d la creme on Crafty's list of ranked moves in a 3 0 Blitz game where there is no time for calculating.

Of course, one bad move and the game may be lost. So whether these figures mean anything or not, only time will tell.

petrovitch

Joined: 08 May 07
Moves: 55475

14 May 08 22:28

8 edits

That's not the key to performance ...

Rating = 800
n = 33
sum = 78
mean = 2.36
Matching 1st Move = 0.4848
Top 5 Moves = 0.8787

But he missed two mates and had a 30 and a 36. 🙁 I've never really examined lower rated players games, but while doing this little experiment I noticed 5 very important things that need to be pointed out to players with ratings less than 1400.

1. The pawn structures were completely destroyed in these lower rated players games. I think they are just moving pawns because they don't know what to do. Also, they try to scare off the enemy with pawn moves that ultimately destroy their own position. These pawn moves are also made at the expense of not developing their pieces. And I don't mean not developing them quickly; I mean not developing them at all! There are still minor pieces on their original squares 20 moves into the game.

😉 Wasn't it Philidor who said, "Pawns are the soul of chess."

2. As mentioned above lack of piece development.

3. Most of the time the queens are never moved in their games. They don't know how to use their queen so they don't user her at all.

4. They react to threats immediately without preparation. While most of their moves are pretty good, they make two or three very bad moves that cost them material and the game.

5. They overlook mating combinations -- not just mate in 3, but many times they don't see mate in 1. This just takes practice. It doesn't take any brains to see mate in 1.

_____________________________________________

Rating = 2100
n = 57
sum = 131
mean = 2.29
Matching 1st Move = 0.7192
Top 5 Moves = 0.8947

By the way, both of these players lost. The mean rank is not that much different. The higher rated player scored remarkedly higher on matching 1st moves with Crafty, but only slightly higher on matching the top 5 moves.

The higher rated player did not miss any mate threats, and did not blunder. His worst two moves were 16 and 15.

So everything is not lost. I think we've learned quite a bit with this little experiment.

_____________________________________________

* I realize I'm talking with myself, but I think these ideas need to be discussed. 🙂

droflace

Joined: 24 Jan 08
Moves: 1805

15 May 08 00:53

thats an interesting way to try to rate a players performance, but I think you have your intepretation of MEAN RANK incorrect.

You speculate having a MEAN RANK of 3.5 means that you played a piece and half a pawn worse than Crafty, but even if you completely matched Crafty your MEAN RANK would still be 1, and there is no way that means you played a pawn worse.

Perhaps a better thing to do would be to sum the difference in evaluations by Crafty, rather than just rank ordering the moves. In this way a blunder will be more heavily punished.

So if Crafty's best move is evaluated at +0.3 and you play a move evaluated at +0.1 your score for that move is 0.2

To punish blunders even more severely (as they normally cost you the game) and reward consistent play we can take the sum of squares divided by the number of moves as your "score" for that game.

I would provide an example from one of my games but I don't have an engine to do the analyses for me.

petrovitch

Joined: 08 May 07
Moves: 55475

15 May 08 01:11

Originally posted by droflace
thats an interesting way to try to rate a players performance, but I think you have your intepretation of MEAN RANK incorrect.

You speculate having a MEAN RANK of 3.5 means that you played a piece and half a pawn worse than Crafty, but even if you completely matched Crafty your MEAN RANK would still be 1, and there is no way that means you played a pawn ...[text shortened]... rovide an example from one of my games but I don't have an engine to do the analyses for me.

I never thought the mean rank would be equivalent to the point count. I just meant that we may not be able to predict what type of correlation may come from it. It's just a beginning. I was disappointed that the 800 and 2100 player had similar scores and the games were so entirely differnt. I had hoped the mean rank would be meaningful.

I thought the low rank of the bad moves would suffice. Maybe we need to square the scores and take the square root of the final sum. That would exagerate the weakness similar to the least squares method.

incandenza

Joined: 04 Jul 07
Moves: 12208

15 May 08 01:43

Have to agree with droflace that it would make much more sense to take the evaluation difference rather than the rank. Could be that all of the top 10 moves are within 0.1 point of each other, or it could be that the 10th is 10 points worse than the 1st. The evaluation difference is a better indicator of how inaccurate the move was.

Exuma

Anansi

Woodshed

Joined: 16 Apr 07
Moves: 35523

15 May 08 03:07

Originally posted by adam warlock
You're totally right on this.Given, of course, that the said player doesn't end up playing the same people all over again the more games the more near his rating is close to his real rating.

But what petrovitch means is that you can't say that a player with more games will have a higher rating (positive correlation), or that a player with more games will have a lower rating (negative correlation).

I suppose really we are talking about some way to address the quality of the games played. A player could have many games played, with a large number of timeouts. Or a player who only plays those ranked much lower. The former resulting in an unfairly low rating, and the latter an unfairly high ranking. Though I do realize that if Weyerstrass plays Kik, there will be no ranking change if it goes as one would expect...

John of Reading

Scotch addict

Joined: 13 Jun 05
Moves: 15520

15 May 08 03:37

Originally posted by Dragon Fire
With new players always coming in at 1200 I would have expected the mean to be 1200 so perhaps some statistics expert can tell me why it does not.

These statistics are derived from the "player tables" page, which includes only those players who have moved in the last 100 days. I would guess that there are vast numbers of RHP users who have joined, lost a few games, lost interest and vanished. This might be enough to push up the mean rating of the active players.

Interesting thread, thanks everyone.

petrovitch

Joined: 08 May 07
Moves: 55475

15 May 08 03:43

6 edits

Originally posted by incandenza
Have to agree with droflace that it would make much more sense to take the evaluation difference rather than the rank. Could be that all of the top 10 moves are within 0.1 point of each other, or it could be that the 10th is 10 points worse than the 1st. The evaluation difference is a better indicator of how inaccurate the move was.

Okay, I replaced the mean rank with the differences between the evaluation of the move chosen and the move suggested (in the sample blitz game). I go the sum then divided by the number of moves to get a mean score of 0.63

Aren't we simply computing a standard deviation between the move suggested and the move selected? Or maybe our performance rating is more like a z-score. It is the distance from the mean. The only thing, though, is that we could not have a positive score since there is no way to measure how much better we are than Crafty. So it would be a one tail test ...

n = 34
sum = 21.44
mean difference = 0.630588235294118
std = 1.19722458745332

Chi Square measure the difference between observed and expected. ANOVA measures the difference between two vectors.

droflace

Joined: 24 Jan 08
Moves: 1805

15 May 08 03:45

Originally posted by John of Reading
These statistics are derived from the "player tables" page, which includes only those players who have moved in the last 100 days. I would guess that there are vast numbers of RHP users who have joined, lost a few games, lost interest and vanished. This might be enough to push up the mean rating of the active players.

Interesting thread, thanks everyone.

The other reason that the mean is not equal to 1200 is that the rating system is not "closed" for provisional players (ie. the points won/lost by the provisional player are not equal to teh points lost/won by the other player).

In games between two non-provisional players the rating system is closed. What this means is that you would expect the overall mean rating to be close to the mean rating of provisional players at the point in which they become non-provisional players (after 20 games)