What's YOUR Match-up Rate?

~~Removed~~

Only Chess

02 May 11 18:29

Zygalski

Joined: 24 May 08
Moves: 717

04 May 11 02:44

1 edit

Originally posted by moon1969
I have noticed in a couple of posts that the some example analyses of match-up rates involves Fritz @ 30 seconds per move.

Yet, if a cheat has his Fritz engine run say one hour per move (instead of only 30 seconds), could not that distort a matchup analysis based on 30 seconds per move.

In other words, doesn't it occur with at least a little regular ...[text shortened]... . How does the analyzer determine what engine analysis time/depth the engine cheater is using.

I used to use Fritz 11 @ 30 seconds per ply.
On a quad core system with a decent-sized hash table this should provide reasonable analysis - ie the engine should have a fair idea of what is & what isn't a sound move.

The system is designed for finding the obvious cheats. Those that log in, set their engine to look at a position & then decide on the move moments later.

You also have to be practical in the parameters you use.
If I set the engine to look at each position for several hours, I'd probably still be analysing my first benchmark game & few people (if any) would be prepared to analyse suspects for engine use.

Multiple analysts have looked at the same batch of games using different engines & systems & the over all match rates are remarkably similar across the board.

~~Removed~~

04 May 11 03:00

1 edit

For what it's worth to anyone considering using the ChessAnalyze demo program:

I reanalyzed the 1st five of my earlier listed games, both with the "Analyze actual move" option checked and with it unchecked. Also, I reanalyzed those games manually with the same engine settings and move times. Here are the results:

With the "Analyze actual move" option checked (default setting):

Top 1 Match: 57/97 (58.8% ) (I played a couple of great games here, well above normal)
Top 2 Match: 74/97 (76.3% )
Top 3 Match: 83/97 (85.6% )

With the "Analyze actual move" option unchecked:

Top 1 Match: 50/97 (51.5% )
Top 2 Match: 71/97 (73.2% )
Top 3 Match: 78/97 (80.4% )

With manual analysis:

Top 1 Match: 45/97 (46.4% )
Top 2 Match: 68/97 (70.1% )
Top 3 Match: 78/97 (80.4% )

So, lesson learned: Unchecking the "Analyze actual move" option gave results closer to the manual analysis results, although even the unchecked option results were a little higher than the manual results.

The odd thing about all this is that even when I made a second run with the same settings, the results of the two runs would always vary a bit. Maybe some kind of weird hash table action going on, or maybe just tiny differences in timing, I'm not sure. These result variances would occur both while using the ChessAnalyze program and during manual analysis.

If I were to reanalyze all of my games with the "Analyze actual move" option unchecked, and throwing out the games against weaker opponents, and throwing out the book moves I made (when I thought I was out of book), I'm sure my m/u rate would be significantly lower than first reported.

Zygalski

Joined: 24 May 08
Moves: 717

04 May 11 03:05

1 edit

Note that for this analysis I used Houdini 1.5, which is apparently one of the strongest engines available.
Also note that as I said before, individual game match rates can be very high indeed, so very little (if anything) can be learnt from a single game & Mad Rook's opponent's match rates.

Houdini 1.5 x64 Hash:256 Time:30s Max Depth:20ply
4x AMD Phenom 2.30Ghz 4GB DDR2 RAM

Game 8079409

{ White: Mad Rook }
{ Top 1 Match: 10/20 ( 50.0% )
{ Top 2 Match: 15/20 ( 75.0% )
{ Top 3 Match: 19/20 ( 95.0% )
{ Top 4 Match: 19/20 ( 95.0% )

{ Black: goffin }
{ Top 1 Match: 13/21 ( 61.9% )
{ Top 2 Match: 15/21 ( 71.4% )
{ Top 3 Match: 15/21 ( 71.4% )
{ Top 4 Match: 17/21 ( 81.0% )

Game 8082056

{ White: shouresh }
{ Top 1 Match: 8/23 ( 34.8% )
{ Top 2 Match: 11/23 ( 47.8% )
{ Top 3 Match: 16/23 ( 69.6% )
{ Top 4 Match: 19/23 ( 82.6% )

{ Black: Mad Rook }
{ Top 1 Match: 14/24 ( 58.3% )
{ Top 2 Match: 18/24 ( 75.0% )
{ Top 3 Match: 20/24 ( 83.3% )
{ Top 4 Match: 20/24 ( 83.3% )

Game 8079411

{ White: National Master Dale }
{ Top 1 Match: 12/20 ( 60.0% )
{ Top 2 Match: 16/20 ( 80.0% )
{ Top 3 Match: 16/20 ( 80.0% )
{ Top 4 Match: 17/20 ( 85.0% )

{ Black: Mad Rook }
{ Top 1 Match: 7/19 ( 36.8% )
{ Top 2 Match: 10/19 ( 52.6% )
{ Top 3 Match: 10/19 ( 52.6% )
{ Top 4 Match: 11/19 ( 57.9% )

Game 7196347

{ White: Mad Rook }
{ Top 1 Match: 17/37 ( 45.9% )
{ Top 2 Match: 25/37 ( 67.6% )
{ Top 3 Match: 31/37 ( 83.8% )
{ Top 4 Match: 33/37 ( 89.2% )

{ Black: carystover }
{ Top 1 Match: 17/37 ( 45.9% )
{ Top 2 Match: 23/37 ( 62.2% )
{ Top 3 Match: 27/37 ( 73.0% )
{ Top 4 Match: 29/37 ( 78.4% )

Game 7187587

{ White: Mad Rook }
{ Top 1 Match: 19/37 ( 51.4% )
{ Top 2 Match: 28/37 ( 75.7% )
{ Top 3 Match: 30/37 ( 81.1% )
{ Top 4 Match: 32/37 ( 86.5% )

{ Black: Yordy Rafael }
{ Top 1 Match: 11/36 ( 30.6% )
{ Top 2 Match: 15/36 ( 41.7% )
{ Top 3 Match: 17/36 ( 47.2% )
{ Top 4 Match: 17/36 ( 47.2% )

Game 7171482

{ White: Mad Rook }
{ Top 1 Match: 11/24 ( 45.8% )
{ Top 2 Match: 16/24 ( 66.7% )
{ Top 3 Match: 17/24 ( 70.8% )
{ Top 4 Match: 18/24 ( 75.0% )

{ Black: chkm8t }
{ Top 1 Match: 3/24 ( 12.5% )
{ Top 2 Match: 7/24 ( 29.2% )
{ Top 3 Match: 10/24 ( 41.7% )
{ Top 4 Match: 12/24 ( 50.0% )

Game 7191803

{ White: jokr55 }
{ Top 1 Match: 15/59 ( 25.4% )
{ Top 2 Match: 27/59 ( 45.8% )
{ Top 3 Match: 36/59 ( 61.0% )
{ Top 4 Match: 44/59 ( 74.6% )

{ Black: Mad Rook }
{ Top 1 Match: 21/60 ( 35.0% )
{ Top 2 Match: 30/60 ( 50.0% )
{ Top 3 Match: 39/60 ( 65.0% )
{ Top 4 Match: 43/60 ( 71.7% )

Game 7159276

{ White: Mad Rook }
{ Top 1 Match: 14/30 ( 46.7% )
{ Top 2 Match: 20/30 ( 66.7% )
{ Top 3 Match: 22/30 ( 73.3% )
{ Top 4 Match: 23/30 ( 76.7% )

{ Black: Nortel }
{ Top 1 Match: 20/31 ( 64.5% )
{ Top 2 Match: 25/31 ( 80.6% )
{ Top 3 Match: 27/31 ( 87.1% )
{ Top 4 Match: 27/31 ( 87.1% )

Game 7174416

{ White: Mad Rook }
{ Top 1 Match: 18/35 ( 51.4% )
{ Top 2 Match: 23/35 ( 65.7% )
{ Top 3 Match: 26/35 ( 74.3% )
{ Top 4 Match: 31/35 ( 88.6% )

{ Black: afzalskhan }
{ Top 1 Match: 14/35 ( 40.0% )
{ Top 2 Match: 20/35 ( 57.1% )
{ Top 3 Match: 23/35 ( 65.7% )
{ Top 4 Match: 26/35 ( 74.3% )

Game 7156710

{ White: The Dels }
{ Top 1 Match: 8/31 ( 25.8% )
{ Top 2 Match: 10/31 ( 32.3% )
{ Top 3 Match: 18/31 ( 58.1% )
{ Top 4 Match: 20/31 ( 64.5% )

{ Black: Mad Rook }
{ Top 1 Match: 19/32 ( 59.4% )
{ Top 2 Match: 24/32 ( 75.0% )
{ Top 3 Match: 25/32 ( 78.1% )
{ Top 4 Match: 26/32 ( 81.3% )

Game 7158490

{ White: Mad Rook }
{ Top 1 Match: 10/28 ( 35.7% )
{ Top 2 Match: 16/28 ( 57.1% )
{ Top 3 Match: 18/28 ( 64.3% )
{ Top 4 Match: 20/28 ( 71.4% )

{ Black: max8888 }
{ Top 1 Match: 7/28 ( 25.0% )
{ Top 2 Match: 11/28 ( 39.3% )
{ Top 3 Match: 14/28 ( 50.0% )
{ Top 4 Match: 16/28 ( 57.1% )

Game 3529887

{ White: Mad Rook }
{ Top 1 Match: 21/48 ( 43.8% )
{ Top 2 Match: 30/48 ( 62.5% )
{ Top 3 Match: 38/48 ( 79.2% )
{ Top 4 Match: 39/48 ( 81.3% )

{ Black: Evey Hammond }
{ Top 1 Match: 17/47 ( 36.2% )
{ Top 2 Match: 26/47 ( 55.3% )
{ Top 3 Match: 28/47 ( 59.6% )
{ Top 4 Match: 30/47 ( 63.8% )

Game 3508454

{ White: Mad Rook }
{ Top 1 Match: 7/25 ( 28.0% )
{ Top 2 Match: 13/25 ( 52.0% )
{ Top 3 Match: 15/25 ( 60.0% )
{ Top 4 Match: 17/25 ( 68.0% )

{ Black: wonderful }
{ Top 1 Match: 10/25 ( 40.0% )
{ Top 2 Match: 15/25 ( 60.0% )
{ Top 3 Match: 17/25 ( 68.0% )
{ Top 4 Match: 19/25 ( 76.0% )

Game 3365905

{ White: Mad Rook }
{ Top 1 Match: 11/39 ( 28.2% )
{ Top 2 Match: 18/39 ( 46.2% )
{ Top 3 Match: 28/39 ( 71.8% )
{ Top 4 Match: 30/39 ( 76.9% )

{ Black: cosmopolitician }
{ Top 1 Match: 11/38 ( 28.9% )
{ Top 2 Match: 23/38 ( 60.5% )
{ Top 3 Match: 27/38 ( 71.1% )
{ Top 4 Match: 29/38 ( 76.3% )

Game 3296363

{ White: TunnelVision }
{ Top 1 Match: 9/23 ( 39.1% )
{ Top 2 Match: 10/23 ( 43.5% )
{ Top 3 Match: 14/23 ( 60.9% )
{ Top 4 Match: 15/23 ( 65.2% )

{ Black: Mad Rook }
{ Top 1 Match: 11/24 ( 45.8% )
{ Top 2 Match: 14/24 ( 58.3% )
{ Top 3 Match: 16/24 ( 66.7% )
{ Top 4 Match: 17/24 ( 70.8% )

Game 3298642

{ White: Mad Rook }
{ Top 1 Match: 18/32 ( 56.3% )
{ Top 2 Match: 26/32 ( 81.3% )
{ Top 3 Match: 30/32 ( 93.8% )
{ Top 4 Match: 31/32 ( 96.9% )

{ Black: Ceorl }
{ Top 1 Match: 16/31 ( 51.6% )
{ Top 2 Match: 20/31 ( 64.5% )
{ Top 3 Match: 20/31 ( 64.5% )
{ Top 4 Match: 23/31 ( 74.2% )

Game 3314049

{ White: Mad Rook }
{ Top 1 Match: 29/61 ( 47.5% )
{ Top 2 Match: 42/61 ( 68.9% )
{ Top 3 Match: 48/61 ( 78.7% )
{ Top 4 Match: 51/61 ( 83.6% )

{ Black: psytek }
{ Top 1 Match: 31/60 ( 51.7% )
{ Top 2 Match: 39/60 ( 65.0% )
{ Top 3 Match: 45/60 ( 75.0% )
{ Top 4 Match: 48/60 ( 80.0% )

Game 3305998

{ White: kalnesh }
{ Top 1 Match: 15/45 ( 33.3% )
{ Top 2 Match: 17/45 ( 37.8% )
{ Top 3 Match: 25/45 ( 55.6% )
{ Top 4 Match: 28/45 ( 62.2% )

{ Black: Mad Rook }
{ Top 1 Match: 18/46 ( 39.1% )
{ Top 2 Match: 26/46 ( 56.5% )
{ Top 3 Match: 28/46 ( 60.9% )
{ Top 4 Match: 33/46 ( 71.7% )

{ Batch Summary }

{ Mad Rook (Games: 18) }
{ Top 1 Match: 275/621 ( 44.3% ) Opponents: 237/614 ( 38.6% )
{ Top 2 Match: 394/621 ( 63.4% ) Opponents: 330/614 ( 53.7% )
{ Top 3 Match: 460/621 ( 74.1% ) Opponents: 395/614 ( 64.3% )
{ Top 4 Match: 494/621 ( 79.5% ) Opponents: 436/614 ( 71.0% )

So a fairly decent match rate for you in these games (all vs far lower rated players than the 2000+ rateds I'd normally analyse, I think the highest rated in the batch is 1903) but far below the benchmark thresholds.
There's no evidence at all as far as the match rate methodology goes that an engine was used to consistently select non-database moves in this batch of games.

~~Removed~~

04 May 11 03:29

Originally posted by Zygalski
So a fairly decent match rate for you in these games (all vs far lower rated players than the 2000+ rateds I'd normally analyse, I think the highest rated in the batch is 1903) but far below the benchmark thresholds.
There's no evidence at all as far as the match rate methodology goes that an engine was used to consistently select non-database moves in this batch of games.

Thanks for the better analysis! (Mad Rook breathes a sigh of relief - He was worried about a possible banning if the results had turned out too well. 🙂 )

One question - Did you analyze these games to the very end? That is, was all the move trimming done at the beginning (with book moves)?

Zygalski

Joined: 24 May 08
Moves: 717

04 May 11 03:36

Originally posted by Mad Rook
Thanks for the better analysis! (Mad Rook breathes a sigh of relief - He was worried about a possible banning if the results had turned out too well. 🙂 )

One question - Did you analyze these games to the very end? That is, was all the move trimming done at the beginning (with book moves)?

The games were analysed from when they went out of book to the end.

moon1969

Houston, Texas

Joined: 28 Sep 10
Moves: 14347

04 May 11 17:51

2 edits

Originally posted by Zygalski
I used to use Fritz 11 @ 30 seconds per ply.
On a quad core system with a decent-sized hash table this should provide reasonable analysis - ie the engine should have a fair idea of what is & what isn't a sound move.

The system is designed for finding the obvious cheats. Those that log in, set their engine to look at a position & then decide on the move ifferent engines & systems & the over all match rates are remarkably similar across the board.

Thanks for the reply.

On a different issue, I have noticed discussion about excluding opening moves from the match-up analysis. How many moves are typically excluded-- the first 10 moves? I was wondering how to characterize when the game is transitioning out of the opening moves.

With regard to the games databases, which I use extensively in my RHP games (I use the 365 database), that for a particular opening, the main book lines may disappear sometimes in my game say at about move 6 or 7, for example, but I may following an odd single game in the database until move 12 or more, for example.

Zygalski

Joined: 24 May 08
Moves: 717

04 May 11 18:04

1 edit

I can't really give a number on the database moves excluded as it varies so much!
A game can go out of book on move 3 or move 33.

I've found that the 2008 Batch Analyzer database is roughly comparable to both the 365 chess & chesslive.de online databases. In practice, it's highly unlikely that when left with a single entry, a suspect game will follow a particular db game for more than a few ply & the saving grace is always the large sample size of non-db moves from 20 or more games with 20+ moves.
A few analysed db moves from a sample of for instance 750 total analysed moves means next to nothing in the final %'s, especially when +5% headroom is given to the benchmark threshold figues for the batch totals.

The long & short of it is that if someone gets banned for engine use using this method then they play far more engine-like chess than any guaranteed unassisted player so far anlalysed, forced moves & all...

Arctic Penguin

Joined: 29 Jul 08
Moves: 1570

04 May 11 18:07

It seems like a significant percentage of moves are continuations and recaptures that even a novice like me would pick out with relative ease, resulting in a lot of extra noise and match-up rate variation between individual games. Could you reduce the noise in this type of analysis by down-weighting or even removing altogether any 1st place engine move that is an arbitrary cutoff (say 1 full point) above the 2nd place engine move? What about running a second engine match-up with a low fixed ply that would pick out the obvious moves and then only counting match-up rates to human moves when the high ply engine picks a different move than the low ply engine? Is there some other reliable way to pick out moves that would be very "obvious" to a human and remove them from the analysis so engine matchup rates at the less obvious moves are given more scrutiny, or would this actually do more harm to the analysis than good?

Zygalski

Joined: 24 May 08
Moves: 717

04 May 11 18:12

1 edit

The obvious moves are also analysed in the benchmark data.

Excluding forced/obvious moves from analysis introduces elements of subjectivity.
What is a reasonably obvious move to a 2500 rated analyst may totally baffle a 2000 rated.

In short the whole thing falls apart if you try to de-select this move for reason a, that move for reason b & you end up with a much smaller sample size of moves which aren't excluded for some arbitrary reason or other & a higher chance of skewed stats & a misleading result!

You have to ask yourself why there may be more or less obvious/forced moves in all the benchmark data than suspect data to justify all this hassle. There is, in my opinion, no logical reason to have this mindset.

Arctic Penguin

Joined: 29 Jul 08
Moves: 1570

04 May 11 18:35

4 edits

You'd definitely need an objective measure of "obvious" rather than the analyst just picking them out based on what they can see. Maybe such a measure is not possible to begin with.

As to shrinking the sample size, I was hoping that it would reduce the number of moves you'd need to analyse for an engine to diverge from the benchmark data faster than it would reduce the move pool, but perhaps not? Of course, you'd have to create new benchmark data using the same method and that would be a pretty big hassle.

It still seems like, in theory, excluding human moves from the analysis for being too "obvious" if they were also picked by a very low ply engine might reduce the noise faster than it would shrink the sample size, but if you already have a method that works well and extensively analysed benchmark data, then there is no good reason to change it.

Varenka

Joined: 21 Sep 05
Moves: 27507

04 May 11 19:08

Originally posted by Zygalski
You have to ask yourself why there may be more or less obvious/forced moves in all the benchmark data than suspect data to justify all this hassle. There is, in my opinion, no logical reason to have this mindset.

There are some players would are mainly positional players that enjoy a slow manuevering battle. They may often play a closed position where it doesn't matter whether they improve the position of their knight and then rook, or vice versa. In terms of computer analysis, there can often be a wider bunch of top moves that are separated by very little in terms of evaluation score.

Contrast that with a player who aims for a sharp tactical battle, where more often move ordering is vital and every tempo counts for more. Here, computer analysis will often highlight a significant evaluation difference between the top moves and it will matter more if the 2nd best is chosen rather that the 1st.

Now, why is it that we expect both types of players to have the same average amount of forced moves in their games? I don't see why it should necessarily be the same, anymore than I expect all GMs to play sacrifices with the same frequency. The level of forcing play may relative to the player's style.

Zygalski

Joined: 24 May 08
Moves: 717

04 May 11 19:24

Yes, but all the benchmark data fall consistently within certain thresholds & these greats had various styles of play, be it Alekhine & his tactical play or Capablanca's more positional approach.
If a guaranteed unasissted Super GM with either style exceeded benchmark thresholds then it rather stands to reason that the thresholds would be altered.
The benchmark thresholds remain intact when this analysis is performed with a reasonably large sample size of objectively chosen games.
Analyse a top CC online suspect & they massively outstrip all benchmark data. Maybe they are far more tactically gifted than Fischer or Oim, or maybe there's another reason for their engine-like play?

Varenka

Joined: 21 Sep 05
Moves: 27507

04 May 11 20:13

1 edit

Originally posted by Zygalski
Yes, but all the benchmark data fall consistently within certain thresholds & these greats had various styles of play, be it Alekhine & his tactical play or Capablanca's more positional approach.

You quote covering styles of play in terms of OTB chess, but we know that CC and OTB chess are not the same game. For example, the spread of openings is not the same for each.

And I know that you've benchmarked some top CC players too, but how many? And how do we account for changes in the last 30 years?

You know that I don't doubt that the method works for blatant cheats. We agree on that. 🙂 But anytime someone questions the finer details of the method, as in this thread, then it often comes down to "well Fischer couldn't so how can anyone else". Maybe we do have to keep in mind other possible factors other than strength. And sure, none of these factors on their own break the method. But when you assume that the difference between OTB and CC can be ignored... how chess has evolved in the last 30 years can be ignored.... player style can be ignored... etc. there starts to be a lot of assumptions that could combine to make a difference.

no1marauder

Naturally Right

Back in the Saddle

Joined: 22 Jun 04
Moves: 43127

04 May 11 22:04

Originally posted by Mad Rook
(The subject of move match-up rates came up in another thread, and it caught my interest.)

I just ran my match-up rates, and I'm shocked. My 1st choice m/u is 50%, and my top 3 m/u is 85%. I expected it to be much lower. (My USCF rating is in the 1200s, for God's sake.)

I haven't played any RHP games in a while, and I only have 26 total.

I excluded ...[text shortened]... r? (For the record, I have never cheated in chess, either here or elsewhere. Scout's honor!)

Your rating is in the 1600's because you've played very few games. The ratings system is heavily dependent on number of games played to move you away from the baseline of 1200. If you played a 100 games at the level you have been you'd easily be in the 2000 range.

My matchup rate OTB is 45/63/72 (sample of 2400 moves over 5 years). Last I checked, my RHP matchup rate was about 5% higher overall.

Paul Leggett

Chess Librarian

The Stacks

Joined: 21 Aug 09
Moves: 115074

05 May 11 00:39

Originally posted by Varenka
You quote covering styles of play in terms of OTB chess, but we know that CC and OTB chess are not the same game. For example, the spread of openings is not the same for each.

And I know that you've benchmarked some top CC players too, but how many? And how do we account for changes in the last 30 years?

You know that I don't doubt that the method wo ...[text shortened]... ed... etc. there starts to be a lot of assumptions that could combine to make a difference.

I was thinking the same thing- comparing a top CC player to Fischer OTB seems incongruous. A strong player with days to move vs a super strong player with minutes per move is an interesting question.

It would have been really cool if Fischer had played a large number of CC games back in the 1960's- simply comparing his OTB rate to his CC rate would be very enlightening.