I'm constructing a database of games from RHP. My intention is to use all games of the top RHP players (approx everyone 2000+). This will only include completed games. It's not a manual process. The admins Russ/Chris are ok with this.
If this database was freely distributed to other players, would anybody have any issues with this? For example, a player rated 1950 may be able to use this to search over games of his 2050 opponent, but not vice versa. I'm aware of RHP's games explorer, etc., so I know there are other ways to see lower rated games.
My own motive is to have an offline DB of high quality RHP games to browse for interest/learning. e.g. search for rook endings, etc. via Chessbase.
The file format is "close to PGN" such that it is readable by the latest Chessbase Light, but it's not yet 100% PGN standard.
Any concerns?
The post that was quoted here has been removedI believe the actual downloaded rhp pgns are tagged with whatever the current rating is, so old games will have incorrect ratings. in fact, they'll have the 'post game' rating even if they just finished.
the graph stores the correct ratings for the last 300 though. I copy the correct ratings manually from that, whenever I finish a game.
Responding to various comments in the thread...
>> Would the database include all the earlier games from all the 2000+ players?
At the moment it does. Procedure is: get top players (e.g. top 50), and for each player get all games. Discard games which are incomplete; have less than 4 ply; or start from a non-standard position. Maybe I could remove games where both ratings (at time of the game) were less than 2000, but see my next note.
>> the graph stores the correct ratings for the last 300 though
Yes, I currently use this to try to get ratings. I'm expecting that for players with more than 300 games, I'm going to have ratings set to 0 for the early games. 🙁 Will see if I can get rating data elsewhere... no sure right now.
>> Is this a single gathering of current data or do you plan on updating??
It was just an initial experiment after reading the RHP "Developers" forum. Not sure how useful the output will be. If it turns out to be useful I may try to repeat periodically.
>> Availability of this database would be nice
I'm still correcting some issues but should have an initial "draft" soon.
Originally posted by VarenkaWhy isn't in standard PGN? Why is it "close to PGN" and how much of an issue is it getting it to standard PGN?
Responding to various comments in the thread...
>> Would the database include all the earlier games from all the 2000+ players?
At the moment it does. Procedure is: get top players (e.g. top 50), and for each player get all games. Discard games which are incomplete; have less than 4 ply; or start from a non-standard position. Maybe I could remove ga ...[text shortened]... uld be nice
I'm still correcting some issues but should have an initial "draft" soon.
Originally posted by cmsMasterWhen I wrote that I was aware of some PGN issues. e.g. date format as "18 Jul '07" rather than "2007.07.07." (which PGN uses). Hopefully sorted most of these now, but I'd rather wait till some others have tried it before I claim it's 100% PGN.
Why isn't in standard PGN? Why is it "close to PGN" and how much of an issue is it getting it to standard PGN?
I've put a trial database at:
http://groups.google.co.uk/group/rhp-games-download/files
Maybe somebody could test this. It should be about 14400 games, based on the top 30 RHP players, in PGN format.
Once I've checked any issues, I can try doing this again for the top 100, etc. One known issue is that I can only supply ratings for the last 300 games of each player.