What engine match % is suspicious?

sh76

Civis Americanus Sum

Only Chess

19 Nov 08 00:19

luctruc

Joined: 28 Jan 04
Moves: 3570

19 Nov 08 19:30

... analysis of strongest legit players has been made using the same time per move as suspect.

OK, and just for the sake of argument, let's suppose the test program were allowed to run longer, and at the new depth the legit players had a higher match-up rate (they were that good). In theory, at least, the likelihood of a "false positive" could depend on the search depth, chosen arbitrarily or with economy in mind. Yes?

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

19 Nov 08 19:34

Originally posted by luctruc
[b]... analysis of strongest legit players has been made using the same time per move as suspect.

OK, and just for the sake of argument, let's suppose the test program were allowed to run longer, and at the new depth the legit players had a higher match-up rate (they were that good). In theory, at least, the likelihood of a "false positive" could depend on the search depth, chosen arbitrarily or with economy in mind. Yes?[/b]

It takes about 1 hour to 2 hours to test a single average game giving 30 seconds per move analysis.
No one is going to spend 6 hours or whatever analysing 1 game.
I'm certainly not - especially when most engine users are probably lazy and Fritz usually seems to have a pretty good idea of the top 2 moves after 30 seconds.

luctruc

Joined: 28 Jan 04
Moves: 3570

19 Nov 08 19:37

An extreme example. Dumb down the computer sufficiently and it will give you moves that almost never occur over 1,000 Golden Knights finalists' games. But RHP player Suspicious Patzer plays those moves 34.9% of the time. Banned!

luctruc

Joined: 28 Jan 04
Moves: 3570

19 Nov 08 19:39

Originally posted by Squelchbelch
It takes about 1 hour to 2 hours to test a single average game giving 30 seconds per move analysis.
No one is going to spend 6 hours or whatever analysing 1 game.
I'm certainly not - especially when most engine users are probably lazy and Fritz usually seems to have a pretty good idea of the top 2 moves after 30 seconds.

This is like the drunk looking for his car keys under a street lamp, even though he dropped them a block away. "The light's better here."

Squelchbelch

Joined: 14 Jul 06
Moves: 20541

19 Nov 08 20:19

Originally posted by luctruc
This is like the drunk looking for his car keys under a street lamp, even though he dropped them a block away. "The light's better here."

No - it's more like the drunk who keeps dropping his keys under every street lamp on the way home but then after the 50th time also finds a penny lying next to his keys.
It might seem like a find - but it's actually rather insignificant & he would have found his keys again anyway! 😛

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 20:48

Originally posted by Korch
I repeat - analysis of strongest legit players has been made using the same time per move as suspect. And if you still did not understand - statistics is made from more than single game.

I know that. I am a statistician so I know quite enough about samples and such like. What I was investigating was the effect of different times on match up rates hence the use of a single game. I expected there to be either no change or the match up rate to steadily decrease or increase with increasing time per move. What I got was apparently random. What does that mean? Well I don't know so that is why I asked!

Korch

Chess Warrior

Riga

Joined: 05 Jan 05
Moves: 24932

19 Nov 08 20:51

Originally posted by Kepler
I know that. I am a statistician so I know quite enough about samples and such like. What I was investigating was the effect of different times on match up rates hence the use of a single game. I expected there to be either no change or the match up rate to steadily decrease or increase with increasing time per move. What I got was apparently random. What does that mean? Well I don't know so that is why I asked!

I think it does not matter if suspect and strongest legit players (which results are compared with results of suspect) are analysed using the same criteria (including the same time per move).

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 20:57

Originally posted by Korch
I think it does not matter if suspect and strongest legit players (which results are compared with results of suspect) are analysed using the same criteria (including the same time per move).

Neither do I. However, I am scientifically minded so when I get a result that seems at odds with expectation I ask questions.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

19 Nov 08 21:02

Originally posted by Squelchbelch
It takes about 1 hour to 2 hours to test a single average game giving 30 seconds per move analysis.
No one is going to spend 6 hours or whatever analysing 1 game.
I'm certainly not - especially when most engine users are probably lazy and Fritz usually seems to have a pretty good idea of the top 2 moves after 30 seconds.

I agree, less time per game is a good idea. I was trying to see if the amount of time could be reduced safely rather than arguing for an increase. I was thinking that technology has now moved on a little so higher processor speed ought to allow shorter times. The results surprised me which is why I was asking the question. I thought there might actually be a reason why 30 seconds had been chosen but it appears to be an arbitrary choice.

eldragonfly

leperchaun messiah

thru a glass onion

Joined: 19 Apr 03
Moves: 16870

19 Nov 08 23:15

talking in circles is suspicious, if you ask me.

wormwood

If Theres Hell Below

We're All Gonna Go!

Joined: 10 Sep 05
Moves: 10228

20 Nov 08 01:38

1 edit

Originally posted by Kepler
10s 75%
20s 66%
30s 80%
40s 70%
50s 73%
60s 65%

what you are seeing here, is the reason why engines can't be blindly trusted in analysing any single position. they have a horizon, beyond which lies madness they can't comprehend in any way. so every time they take one step deeper into the variation at hand, the evaluation will jump quite randomly. it just seems to us that the evaluation converges, because the time required for each consequtive move increases exponentially. - there really aren't any other evaluations except 0,1 and ½. everything else is just a subjective opinion of the engine, a guess which it spits out because it never got far enough.

however, this doesn't change the fact that statistically speaking, engines play differently from humans. we make a lot of mistakes of various sizes. but an engine doesn't, within its search space. it's practically incapable of miscalculation, where as us humans do nothing but. we just have heuristics like "that looks dangerous", largely unbased on concrete variations, which help us avoid the humane imperfection. the current engine paradigma doesn't have that. (and as you're mathematician, I'm sure you know of the 'no free lunch' theorem, which basically says machines CAN'T have any knowledge of what lies beyond its search. us humans have association to previous experiences, which by a complex fuzzy mechanism gives us 'feelings', which factually don't relate to the position at hand in any way.)

what this means, is that engine evaluation is not the truth, nor is human evaluation. the only exceptions are the known basic endgames and tablebases.

but, withing the search depth of current machines, the best humans make slightly more humane errors on average, than he can surpass the machine search space and catch the machine 'errors'. (machine 'errors' are not real errors, it didn't make a miscalculation. it just didn't get there and ignored whatever there was beyond.)

so, when we detect engine abuse, we're not actually trying to differentiate between 'who makes best moves' or conclude 'can a human play this good'? the machine evaluation doesn't converge toward the truth of the position, no matter how deep it searches, unless it reaches 0,1 or ½. even if you get to move 200, the 201st move can turn the eval from +18 to 'white is mated'. - we're actually making a statistical comparison between the styles of human chess vs. engines, not how close to the 'truth' their evaluation is. and the styles have different statistical profile based on what I wrote above (and more), and will be clearly visible with big enough sample size. as we can verify from test data from computer games and pre-computer era human games.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

20 Nov 08 11:06

Originally posted by wormwood
what you are seeing here, is the reason why engines can't be blindly trusted in analysing any single position. they have a horizon, beyond which lies madness they can't comprehend in any way. so every time they take one step deeper into the variation at hand, the evaluation will jump quite randomly. it just seems to us that the evaluation converges, ...[text shortened]... est data from computer games and pre-computer era human games.

Yep, that seems to be in agreement with my own ideas. I am interested in investigating this NOT to prove or disprove the cheat detection process but because I have an interest in statistics, computer chess and artificial intelligence. There is a cheat detection process which bans some players from time to time. Presumably it has been tested sufficiently to know that any false results are within acceptable limits (whatever that may be).

My interest lies in how we distinguish between man and machine, not the rights and wrongs of computer use in internet chess or the accuracy of the detection process. The methods used to detect cheats are essentially a means to distinguish between man and machine in the context of this site. My reason for investigating the effect of time on match up rates is simply wanting to know if there was some kind of rationale behind the 30 seconds generally quoted here or whether it was just an arbitrary figure. As far as I can see there is no overriding reason to choose that figure over any other apart from a trade off between allowing the engine time to search to a reasonable depth and not prolonging the analysis time beyond what is humanly endurable. Provided I use a sensible time for all analysis I am convinced I should get meaningful results.

My reason for publishing the results i got was because I found it interesting and thought others might also be interested. It is certainly counterintuitive.

~~Jie~~

benching

Joined: 17 Jul 08
Moves: 1218

20 Nov 08 11:19

Originally posted by Kepler
I agree, less time per game is a good idea. I was trying to see if the amount of time could be reduced safely rather than arguing for an increase. I was thinking that technology has now moved on a little so higher processor speed ought to allow shorter times. The results surprised me which is why I was asking the question. I thought there might actually be a reason why 30 seconds had been chosen but it appears to be an arbitrary choice.

I think commercial sites like ICC or Playchess use a different & faster approach but there are no published details on it. Maybe they have an internal blundercheck function or something similar that lights up on the sysadmin screen.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

20 Nov 08 11:25

Originally posted by Jie
I think commercial sites like ICC or Playchess use a different & faster approach but there are no published details on it. Maybe they have an internal blundercheck function or something similar that lights up on the sysadmin screen.

Apparently they are testing for what software is the focus of attention. In other words, if you constantly put the browser or software used to access the site in the background you are likely accessing some other software while playing. Since both sites offer real time (as opposed to correspondence style) chess this is suspicious. Sites offering correspondence style chess cannot use the same method because I could log in, note the moves my opponents have made, log out and only then use an engine or consult the GM next door, leaving plenty of time to return later or on another day and make my moves.

wormwood

If Theres Hell Below

We're All Gonna Go!

Joined: 10 Sep 05
Moves: 10228

20 Nov 08 15:28

Originally posted by Kepler
Apparently they are testing for what software is the focus of attention. In other words, if you constantly put the browser or software used to access the site in the background you are likely accessing some other software while playing. Since both sites offer real time (as opposed to correspondence style) chess this is suspicious. Sites offering correspondenc ...[text shortened]... lt the GM next door, leaving plenty of time to return later or on another day and make my moves.

I've heard this rumour many times with slightly different 'function', but I seriously can't see how it could be true. because a) it would be ridiculously inefficient and random, b) extremely invasive, possibly illegal, and c) there's a mundane explanation for running engine simultaneously with blitz client: blitzing while analysing past games.

one of the most popular version of the rumour is that the client detects fritz running. well, I've fired up fritz both during icc & playchess, and absolutely nothing happened. I don't think there's any truth in it, and that ICC et al. probably detect engines very much like rhp game moderation. and with blitz they can probably get away with shorter analysis time, even dynamically adjusting it by how much time the player used on a specific move.

on the 30s question, I think it's just an educated guess, based on what has been found to work in practice, rather than a calculated coefficient.