What engine match % is suspicious?

sh76

Civis Americanus Sum

Only Chess

19 Nov 08 00:19

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 13:30

Originally posted by diskamyl
yes but they only included in the list because it was the one Kasparov had played. It's a very old version. engines were a lot weaker back then.

The strength of the engine should not be a factor. If engine games are truly distinguishable from human games even a weak engine should be distinguishable. If not, then anyone using a weak engine to cheat would be undetectable using current methods. I suspect even a weak engine would be sufficiently strong to get to the upper echelons on this site.

~~Jie~~

benching

Joined: 17 Jul 08
Moves: 1218

19 Nov 08 13:45

Originally posted by diskamyl
here's a list of otb GM matchups with rybka 2.3.2 at 14 ply (it should be longer than 30 seconds) for the first choice of the engine, taken from http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=103791

it's very interesting to see morphy have a higher matchup than kasparov, but this could have to do with the selection of games:

Shirov 59,0 ...[text shortened]... "best moves" by Rybka 2.3.2 at 14 Ply Forwards, from more than 1400 positions from 34 games.

Isn't Fritz the most human-like engine rather than rybka? Those percentages tell you nothing if the games were positional GM-style games, since engines are good in tactical games. If the games are games with endings, the endgame tablebases and so on kicks in resulting in lower matchups.

On the other hand given Chessmaster's ability to play like Capablanca, a clever programmer could program his engine to play like Fischer or Kasparov and then people will claim engine Z plays like Kasparov or Fischer.

droflace

Joined: 24 Jan 08
Moves: 1805

19 Nov 08 13:48

Originally posted by wormwood
well, anyway you decide to go, a good sanity check is to do the exact same test for the greats, like capablanca, and see how they do. that way, whatever parameters your specific system has, the threshold will be adjusted accordingly.

best would be to run through a number of pre-computer era CC champs. but of the otb players I believe capablanca has had the highest matchup rate.

at least it should give you a some kind of a rule of thumb.

I would think that the match up rates from pre computer players would be artificially higher than matchups should be now...

If you were writing a chess program, wouldn't you try to make it play like the great players of the past? The older players might not use an engine, but an engine might use them?

~~diskamyl~~

Joined: 29 Mar 07
Moves: 1260

19 Nov 08 13:53

Originally posted by Kepler
The strength of the engine should not be a factor. If engine games are truly distinguishable from human games even a weak engine should be distinguishable. If not, then anyone using a weak engine to cheat would be undetectable using current methods. I suspect even a weak engine would be sufficiently strong to get to the upper echelons on this site.

I see your point now, I agree with that.

~~diskamyl~~

Joined: 29 Mar 07
Moves: 1260

19 Nov 08 13:55

Originally posted by Jie
Isn't Fritz the most human-like engine rather than rybka?

no, Rybka is now the most human-like engine. AFAIK Fritz has always been a "computerish" one.

luctruc

Joined: 28 Jan 04
Moves: 3570

19 Nov 08 14:58

Originally posted by Kepler
The strength of the engine should not be a factor. If engine games are truly distinguishable from human games even a weak engine should be distinguishable. If not, then anyone using a weak engine to cheat would be undetectable using current methods. I suspect even a weak engine would be sufficiently strong to get to the upper echelons on this site.

Word! I suspect it's being done, even as we speak. Running an antique edition of CM, say, on a fast chip could get you 2100-rated moves, at least in the middle game, plus a low match-up, at least on first choices, with the newer, stronger software the mods test against.

DawgHaus

Joined: 21 Sep 06
Moves: 24552

19 Nov 08 16:35

All right, 'fess up. Which one of you 1500's is using ENIAC?

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

19 Nov 08 17:07

Originally posted by Kepler
The strength of the engine should not be a factor. If engine games are truly distinguishable from human games even a weak engine should be distinguishable. If not, then anyone using a weak engine to cheat would be undetectable using current methods. I suspect even a weak engine would be sufficiently strong to get to the upper echelons on this site.

The strength of engines compared with each other depends on what is happening at the end of their search trees. The main differences between engines are to do with their evaluation functions and how they implement various tricks at the leaf nodes. I think you are right that for most tactical situations the main effect comes from the 10 - 14 ply depth search, the extra stuff at the nodes is by and large only going to be important when one engine cheat is playing against another.

I think as far as detection goes most of the difference in engine output is what order they put their top few moves in, so it don´t think different engine settings and so forth will make that much difference as far as detecting cheats goes.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 17:29

Originally posted by DeepThought
The strength of engines compared with each other depends on what is happening at the end of their search trees. The main differences between engines are to do with their evaluation functions and how they implement various tricks at the leaf nodes. I think you are right that for most tactical situations the main effect comes from the 10 - 14 ply depth ...[text shortened]... ent engine settings and so forth will make that much difference as far as detecting cheats goes.

I have been trying various times for the analysis ranging from 5 seconds per move to 1 minute per move. Although time has an effect on match ups it is not predictable. For instance, one game I analysed with an engine called Sigma produces the following first choice match up percentages:

10s 75%
20s 66%
30s 80%
40s 70%
50s 73%
60s 65%

Other engines produce completely different patterns. It appears to me that the combination of time per move and engine is critical. That is why I asked earlier in this thread for information on the reasoning behind the 30 seconds per move that is generallyaccepted as the standard here. So far I have had no answer so will have to just assume it is an arbitrary choice.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

19 Nov 08 17:43

Originally posted by Kepler
I have been trying various times for the analysis ranging from 5 seconds per move to 1 minute per move. Although time has an effect on match ups it is not predictable. For instance, one game I analysed with an engine called Sigma produces the following first choice match up percentages:

10s 75%
20s 66%
30s 80%
40s 70%
50s 73%
60s 65%

Other en ...[text shortened]... tandard here. So far I have had no answer so will have to just assume it is an arbitrary choice.

I´d imagine that the 30s per move figure is practical. If you have 25 games to analyse that´ll be somewhere between 500 and 1,000 non-opening moves to crunch. Increasing the depth of an engine search by 1 ply increases the time to calculate by a factor of around 8, so the next step up after 30s is about 5 minutes per move, They probably found that 30s per move gives the engine enough time to hit a reasonable search depth and for the chance that the engine will change it´s mind if you give it longer to be acceptable, without making the analysis take forever.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 17:56

Originally posted by DeepThought
I´d imagine that the 30s per move figure is practical. If you have 25 games to analyse that´ll be somewhere between 500 and 1,000 non-opening moves to crunch. Increasing the depth of an engine search by 1 ply increases the time to calculate by a factor of around 8, so the next step up after 30s is about 5 minutes per move, They probably found that 30s ...[text shortened]... ange it´s mind if you give it longer to be acceptable, without making the analysis take forever.

I am thinking that the question I should be asking is not why 30s but rather what search depth the 30s is based on. Anyone got any ideas?

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

19 Nov 08 18:54

Originally posted by Kepler
I have been trying various times for the analysis ranging from 5 seconds per move to 1 minute per move. Although time has an effect on match ups it is not predictable. For instance, one game I analysed with an engine called Sigma produces the following first choice match up percentages:

10s 75%
20s 66%
30s 80%
40s 70%
50s 73%
60s 65%

Other en ...[text shortened]... tandard here. So far I have had no answer so will have to just assume it is an arbitrary choice.

If strongest legit players have been analysed with the same time then there is no legit reason why the suspect has higher matchup.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

19 Nov 08 18:54

Originally posted by Kepler
I am thinking that the question I should be asking is not why 30s but rather what search depth the 30s is based on. Anyone got any ideas?

In the thread on who the most creative player is Thread 103777 someone posted this game of Tebb´s Game 617581, Tebb won with a nice build up against black´s weak f-pawn.

I went through it with Crafty on an old Semperon based single core 1.5 GHz machine, just seeing how deep its search got to after 30s or so. On most moves Crafty was getting to 12 ply quickly but didn´t normally get to 13 ply. On a few moves it found it harder and only just got to 12 ply in the 30s, on several the search would get to 14 or 15 ply.

In some positions the search tree is more amenable to pruning than others, so ply depth isn´t the right thing to be looking at. Using a modern quad core machine you´d expect the search to be 4 times faster and get 1 or 2 ply deeper.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 18:59

Originally posted by Korch
If strongest legit players have been analysed with the same time then there is no legit reason why the suspect has higher matchup.

Suspect? What suspect? The list of match up rates was for differing times on a single game to show that results vary wildly (and unpredictably) according to time per move used. I know both players in that game were engines, it was game between Rybka and Shredder from the 16th World Computer Chess Championship.

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

19 Nov 08 19:02

1 edit

Originally posted by Kepler
Suspect? What suspect? The list of match up rates was for differing times on a single game to show that results vary wildly (and unpredictably) according to time per move used. I know both players in that game were engines, it was game between Rybka and Shredder from the 16th World Computer Chess Championship.

I repeat - analysis of strongest legit players has been made using the same time per move as suspect. And if you still did not understand - statistics is made from more than single game.