How to detect engine cheats

Squelchbelch

Only Chess

22 Jun 09

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

2 edits

Advantages of top 3 matchup analysis:

1) There is a significant body of statistics which illustrate what is humanly possible both from pre-computer era CC World Championships and also Super GM OTB games. The upper-end thresholds are remarkably consistent.

2) All you need to do high quality analysis is a reasonably powerful pc, a decent modern engine and access to an online database such as www.chesslive.de

3) The method itself is very simple.

The 30 second top 3 matchup allows for reasonably practical analysis (ie games don't take 3 weeks to do!) but also yields very high quality results, in keeping with those achieved by many other Games Moderators.

You can go for instance 60 seconds per move, but I tried this with several games and results were virtually identical to 30 secs but you spend many hours on 1 game!

4) All non-database moves are included. You don't need to remove moves you subjectively consider are forcing or obvious. 'Only legal' moves are also taken into account in the threshold stats on post #4 of this thread.

Any argument that suggests that these moves should be exempt from final figures ignores the fact that they frequently appear in pre-1980's CC WC & Super GM OTB matches. Try analysing Capablanca-Alekhine 1927 or Fischer-Spassky 1972 and then tell me there aren't plenty of forced exchanges.

The 'theory' as such is based on a false premise, unless you think all the top player legitimate games have an un-representatively low amount of these moves!

Disadvantages are:

1) It takes about an hour or more to analyse and write up one game. 20+ games are needed as evidence, so, unless you are chained to your pc this will take a week or 2 to collect data on one suspect.

How to select a suspect for analysis

With the effort and time needed, you want to make certain you have grounds to suspect the user you are investigating may be using an engine on a consistent basis in their games.

It is logical to go for the most blatant cheats, as these should be the easiest to detect. Some people will only use an engine on rare occasions - these people cannot be caught with the top 3 matchup method.

There are some give-aways:

-They are within the top 2 % or so of the highest rated players on the site. A controversial point, but proven by experience. This is where the blatant cheats live.

-In their earlier games they played like a patzer losing to 1400 rateds. A few month's later they are playing like a GM, beating 2300's.

-The number of games in progress and the move frequency in these games. It could be that you have stumbled upon a Super GM, dishing-out a free, unofficial online blitz simul. This is probably quite unlikely and even so, analysis can prove if this is a GM or not.

-No losses in many completed games against all-comers. Engines don't get tired and very rarely make blunders. Humans do. The lack of mistakes in many games over time is a key indicator of engine use.

-They cannot realise when a position is clearly drawn and so play on. Their engine says that they have a 0.25 advantage in the position, so of course they attempt to go for the non-existent win.

-They repeatedly go for sharp winning lines in games where there are clear and simple ways to maintain the advantage by simplifying or playing 'safe & solid' human-like moves.

-The engine seems to have no plan in closed positions. This only really applies to the weakest engines these days. Many of the moves that modern engines play in these positions are similar to those which strong human players would choose.

Since we are going for the most blatant cheats, limit yourself to players in the top 1 or 2% highest rated. There is little point going after an 1800 with a suspicion they may be getting 'help'.

So, you now have a suspect. Next up is the crucial; which games are you going to analyse?

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

1 edit

The selection criteria and a brief explanation of the logic behind them are as follows:

1) 20+ completed games need to be used as a sample, each with at least 20+ non-database moves

This is because you need a quite large sample size of non-database moves. At least 400, but the bigger the better. The weight of evidence will only increase as more non-database moves are used in the final results. Generally, I like to submit at least 20 games with 600 or so non-book moves.
You can of course analyse fewer games & leave the Mods to do the rest. 10 objectively chosen games would be an absolute minimum.

2) Games must be selected on a purely objective basis

Random selection is no good, because you lay yourself open to criticism that games which higher matchup rates were cherry-picked and selection wasn't random at all! Select the games from the final rounds of tourney's if possible, or simply the last 20+ completed games that fulfill all other criteria.

There can be no hint of bias in games selection, or your evidence loses credibility.

3) They should be against high-quality opposition

Games against other good players should generally ensure a more balanced position, where there are less obvious top 3 engine moves for a legitimate human player. If a 2300 plays a 1700 (and the games lasts 20+ non-database moves!) then the 1700 will make tactical/strategic mistakes and thus generate plenty of obvious top 3 moves. This can skew results toward a false positive.

4) Select recent games which fulfil above criteria

Most engine users tend to start out only occasionally using the engine for help. Months later and they find they cannot log into the site without first firing-up Fritz etc. Engine use tends to only increase with time, so pick recently completed games where possible.

Now we have a suspect, also a carefully selected 20+ games and can begin games analysis!

1) Find out where the game goes out of book on www.chesslive.de. Make sure it doesn't transpose back in a few moves later by playing a few extra moves.

2) Write this down on a jotter. We are only analysing moves after the game goes out of db! Also write all the other move numbers by both players for the remainder of the game. Circle the move numbers to avoid confusion when adding the stats for results later.

Here is an example of a completed matchup sheet:

http://img44.imageshack.us/img44/2536/matchupsheet1.jpg

3) Now copy/paste the .PGN into your engine, or make all the moves manually if you can't do this.

4) Set your engine with a reasonably large hash table (this should be an even number - I use 192 MB. Check a few random games to see how many kN/S you get and adjust the size of the table to optimum effect. 192 or 256Mb are usual) and to look for the top 3 moves/lines with infinite analysis and go to a few ply before the game went out of book to start. This is so that the engine doesn't go into the position cold. I generally start the 30 second move cycle about 4 or 5 ply back, without recording anything just yet, obviously! You'll see it will quickly catch up.

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

1 edit

5) Move the engine forward a single move each time the timer reaches 30 seconds. When it gets to where the game went out of book, record where the engine scores the moves played by both players on the matchup sheet as in the above example. Make certain you continue moving forward at precisely the same intervals (30 seconds in this example) throughout the game, despite where the engine is in it's analysis of lines.

6) If the engine scores all moves the same, but your player's move appears as 3rd choice, mark it as such! Don't be tempted to alter the results as this totally removes credibility from your analysis. If the move doesn't appear in the scoring pane, then mark it as N/A or similar.

7) Once completed, you will have to write the scores into the .PGN and also write up the results for both players.

Include a header for the analysis, showing what conditions the analysis was done under. Here is an example:

Fritz 11 @ 30 seconds per move

Pentium 4 2.93GHz 1GB RAM

Hash Table 192MB

Database used

www.chesslive.de

[Event "World Championship 28th"]

[Site "Reykjavik"]

[Date "1972.07.11"]

[Round "15"]

[White "Spassky, Boris V"]

[Black "Fischer, Robert James"]

[Result "1/2"]

[ECO "B99"]

[PlyCount "86"]

[EventDate "1972.??.??"]

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Bg5 e6 7. f4 Be7 8. Qf3

Qc7 9. O-O-O Nbd7 10. Bd3 b5 11. Rhe1 Bb7 12. Qg3 O-O-O 13. Bxf6 Nxf6 14. Qxg7

Rdf8 15. Qg3 b4 16. Na4 Rhg8 17. Qf2 Nd7 18. Kb1 Kb8 {

Takes game out of book; 2nd choice} 19. c3 {Not in top 3} Nc5 {1st choice} 20.

Bc2 {1st choice} bxc3 {2nd choice} 21. Nxc3 {1st choice} Bf6 {3rd choice} 22.

g3 {3rd choice} h5 {Not in top 3} 23. e5 {3rd choice} dxe5 {1st choice} 24.

fxe5 {1st choice} Bh8 {1st choice} 25. Nf3 {Not in top 3} Rd8 {1st choice} 26.

Rxd8+ {Not in top 3} Rxd8 {1st choice} 27. Ng5 {Not in top 3} Bxe5 {1st choice}

28. Qxf7 {1st choice} Rd7 {3rd choice} 29. Qxh5 {1st choice} Bxc3 {1st choice}

30. bxc3 {1st choice} Qb6+ {1st choice} 31. Kc1 {2nd choice} Qa5 {1st choice}

32. Qh8+ {3rd choice} Ka7 {1st choice} 33. a4 {3rd choice} Nd3+ {2nd choice}

34. Bxd3 {1st choice} Rxd3 {1st choice} 35. Kc2 {1st choice} Rd5 {2nd choice}

36. Re4 {1st choice} Rd8 {Not in top 3} 37. Qg7 {1st choice} Qf5 {1st choice}

38. Kb3 {Not in top 3} Qd5+ {Not in top 3} 39. Ka3 {1st choice} Qd2 {2nd choice

} 40. Rb4 {1st choice} Qc1+ {1st choice} 41. Rb2 {1st choice} Qa1+ {1st choice}

42. Ra2 {1st choice} Qc1+ {2nd choice} 43. Rb2 {1st choice} Qa1+ {1st choice} 1/2

Result:

White: Spassky

Top 1 Match: 16/25 (64,0% )

Top 2 Match: 16/25 (64,0% )

Top 3 Match: 20/25 (80,0% )

Black: Fischer

Top 1 Match: 15/26 (57,7% )

Top 2 Match: 21/26 (80,8% )

Top 3 Match: 23/26 (88,5% )

In this example, Spassky had 5 moves out of 25 that weren't even top 3 engine choices after 30 secs, Fischer had 3 non-top 3 engine moves.

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

3 edits

8) Once you have done all 20+ games, you can also do an average of both the suspect's top 3 matchup rates and also his oppositions matchup rates.

Simply divide the totals for top 1 match/top 2 match/top 3 match between the amount of games played and find %'s for each.

What the results mean?

After years of evidence gathering both with pre-computer CC World Championships and also the highest quality OTB GM games, it is know that only the very best players can get near results of

Top 1 match = 60%

Top 2 match = 75%

Top 3 match = 85%

under the selection criteria. Results for a single game may be much higher than these. It is the average over time that is the critical factor.

If you have found a suspect with total average stats like these, you should report for engine use and at least let site Admin take it further.

Be safe in the knowledge that the evidence you've submitted is much more credible than 90% that will ever be given to site Admin, such as 'player X is clearly using an engine because he's only rated 1400 but he played-out a forced mate in 5' or 'player Y hasn't lost to a single player yet. I think he's a cheat' when player Y only plays sub-1500 rateds!

If you have a result of matchup rates for all games averaging on or above the following:

Top 1 match = 65%

Top 2 match = 80%

Top 3 match = 90%

Then I can absolutely guarantee you that you have found a rather blatant engine user. Congratulations! These people should always be banned from sites where engine use is against the Terms Of Service.

If you have any questions about any of this, or if I have missed something, please let me know.

Happy hunting!

USArmyParatrooper

Joined: 10 May 09
Moves: 13341

22 Jun 09

Fortunately for guys like me we don't have to worry about cheats. A monkey with downs syndrom can beat me.

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

I'm only 1700 here and OTB.
You don't have to be a strong player to submit evidence of engine use if you wish to help clean this site up.

orion25

Art is hard

Joined: 21 Jan 07
Moves: 12359

22 Jun 09

great work squelch, thanks! 🙂

Fat Lady

Joined: 21 Feb 06
Moves: 6830

22 Jun 09

Originally posted by Squelchbelch

If you have a result of matchup rates [b]for all games averaging on or above the following:
Top 1 match = 65%
Top 2 match = 80%
Top 3 match = 90%

Then I can absolutely guarantee you that you have found a rather blatant engine user. Congratulations! These people should always be banned from sites where engine use is against the Terms Of Service.
[/b]

Hi Squelch,

Recently someone sent me some analysis they had done on a yet-to-finished game between two players on this site, using exactly the same method as you've described above:

White: xxxxxxxx
Top 1 Match: 30/48 (62,5% )
Top 2 Match: 39/48 (81,3% )
Top 3 Match: 46/48 (95,8% )

Black: yyyyyyyyyyy
Top 1 Match: 35/47 (74,5% )
Top 2 Match: 43/47 (91,5% )
Top 3 Match: 45/47 (95,7% )

Do you these numbers suggest to you that both players are using an engine, just one of them or is it possible that they are both on the level?

Thanks,

FL

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

1 edit

On that basis I'd definitely examine more games by both players because you need a big (400+ non-book moves) sample size before you can draw those sorts of conclusions.

Talisman

Joined: 20 Jan 07
Moves: 24894

22 Jun 09

Originally posted by Squelchbelch
5) Move the engine forward a single move each time the timer reaches 30 seconds. When it gets to where the game went out of book, record where the engine scores the moves played by both players on the matchup sheet as in the above example. Make certain you continue moving forward at precisely the same intervals (30 seconds in this example) throughout ...[text shortened]... p 3 engine choices after 30 secs, Fischer had 3 non-top 3 engine moves.

You my friend are completely obsessed! There will be people who cheat on Correspondence chess sites from now until the end of time. It's a sad unfortunate fact about the advent of blue chip technology and the psychological make up of the human mind. No amount of obsessive analysis is ever going to change that.

I really think you're in danger of becoming so consumed by the games of other players that it's likely to have a detrimental effect on your own play.

My advice would be to take a holiday and then come back and simply play the game. relax and enjoy!

The cheats will still be there with or without the reams of your computer generated analysis.

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

1 edit

Originally posted by Talisman
You my friend are completely obsessed! There will be people who cheat on Correspondence chess sites from now until the end of time. It's a sad unfortunate fact about the advent of blue chip technology and the psychological make up of the human mind. No amount of obsessive analysis is ever going to change that.

I really think you're in danger of becoming The cheats will still be there with or without the reams of your computer generated analysis.

Yes I largely agree with all this.
That's probably why I don't analyse RHP user's games for matchup rates anymore.
I'm actually more interested in finding out what strong, genuine human players can achieve. After all, someone does need to collect evidence if accurate decisions are to be made.

For instance:

Results so far from 1927 World Championship
Alekhine
Top 1 Match: 348/635 (54,8% )
Top 2 Match: 462/635 (72,8% )
Top 3 Match: 506/635 (79,7% )

Capablanca
Top 1 Match: 342/635 (53,9% )
Top 2 Match: 465/635 (73,2% )
Top 3 Match: 522/635 (82,2% )

Capablanca is often spoken of as being the most accurate human player who ever lived. Here he is put to the ultimate test by Alekhine.

His results (with just 4 more games to go) show that he is also well within the thresholds mentioned earlier.

And yes, chess does attract the obsessive types!

😛

Blackamp

Death

is no semi-colon

Joined: 14 Dec 08
Moves: 23029

22 Jun 09

So if someone had this profile, they get a clean bill of health? it still seems good enough to get quite a good rating.

Top 1 match = 50%

Top 2 match = 60%

Top 3 match = 75%

Top 4 match = 100%

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

1 edit

Originally posted by Blackamp
So if someone had this profile, they get a clean bill of health? it still seems good enough to get quite a good rating.

Top 1 match = 50%

Top 2 match = 60%

Top 3 match = 75%

Top 4 match = 100%

As I said before, my methods are designed to catch the most blatant engine users.
I think I saw stats that suggested that Games Mods do look at top 4 results.

Many times (say a 2200+ vs another 2200+) the matchup rates that you gave under the selection criteria mentioned earlier would in fact mean a very quick loss for the person attempting this IMO, when the known thresholds are top 1 match +10%, top 2 match +15% and top 3 match +10% above what you are hypothesising.
The rates are even higher than that for people who actually get banned!

Blackamp

Death

is no semi-colon

Joined: 14 Dec 08
Moves: 23029

22 Jun 09

1 edit

Originally posted by Squelchbelch
As I said before, my methods are designed to catch the most blatant engine users.
I think I saw stats that suggested that Games Mods do look at top 4 results.

Many times (say a 2200+ vs another 2200+) the matchup rates that you gave under the selection criteria mentioned earlier would in fact mean a very quick loss for the person attempting this I ...[text shortened]... you are hypothesising.
The rates are even higher than that for people who actually get banned!

i was trying to get some numbers that would be 'below the radar' as far as the detection method you mentioned was concerned, but still high enough to assure good performance - the kinds of numbers an honest 2300 player might get - i probably didn't get the actual numbers right.

i know you said that you're targeting the most blatant cheats, but what i was trying to establish is that a 'smart' cheat having read this thread should be able to engineer a profile that looks like a legit 2300 player and get a 2300 rating, while remaining undetected.

it seems likely to me that there are such smart cheats around, so i doubt the site can ever really be 'cleaned up'.

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

22 Jun 09

They would have to keep a clear eye on what %'s they've got in all moves played in all games.
That would take a monumental effort when you consider many highly rated players (and/or cheats) have 50+ games in progress.
The more I think about this, the less likely I think that it's much of a problem at all.

It is an interesting idea and of course you will always get the infrequent engine users who fly below the radar.

People who claim that I shouldn't be wasting my time on this perhaps haven't been blasted-off the board in 30 moves by a blatant engine when you've worked damn hard to get to the final round of a CC tourney.
The more people that can submit evidence on blatant engine users the better, as far as I'm concerned. That was the point of this thread.

How to detect engine cheats - a guide

Only Chess