Open letter to Russ re/engine use

Squelchbelch

Only Chess

28 Nov 08 11:55

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 21:55

Originally posted by Yuga
The strongest human players can hit 60%+ 1st choice OTB according to Gatecrasher's statistics. So I think can assume that it is possible for the strongest human correspondence players to do so as well.

I think I recall that only Rittner in that CC tournament in the 60's hit above 60% first choice, but CC and chess theory has greatly improved in the last 40 ...[text shortened]... my statement.

I don't know Kepler's statistics, only his general methodology and results.

I got a match up rate for first choice moves of approximately 61% for the humans playing 1922 and approximately 60% for the engines in 2008. I would not want to put too much faith in those figures. Figures may not lie but some will not stand up either.

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08 22:19

Originally posted by Kepler
It is indeed obvious that an engine will get a high match with itself. No debate there, that is exactly the reason I did not use HIARCS for my own analysis. Now, if we are saying that a human cheating by using an engine gets a high match up rate irrespective of the engine he is using and the engine used to analyse his games then surely the match up rate betwe ...[text shortened]... ll not work since I performed exactly the test I should be performing to do what I wanted to do.

Now, if we are saying that a human cheating by using an engine gets a high match up rate irrespective of the engine he is using and the engine used to analyse his games then surely the match up rate between engines is also high?
Wrong. Again. You see, nobody said it was irrespective of the engine he is using.

I am testing for equality of means between two samples which is why I used a two sample t-test, it was designed to do just that
I don't disagree that you're doing that. I'm telling you it is a poor test considering the question you wanted to answer. Which was (and I quote you here): if it is possible to distinguish between the play of engines and the play of humans..

I have to be able to show that it is anomalous in some way. Just saying I did the wrong test will not work since I performed exactly the test I should be performing to do what I wanted to do.
It is only anomalous because you are still interpreting the result as if the test had been the correct one, i.e. that it is a good way to distinguish between the play of engines and the play of humans. Did you ever consider why there are even engine vs engine tournaments ? Why the winners are not random, but consistent? What type of match-up rates would you get if you used Battlechess as a standard? Really, your result are just all too obvious under a correct interpretation of them.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

01 Dec 08 22:31

Originally posted by Korch
Disproving methods (which were used to detect many obvious and not so obvious cheats) is advantageous for banned cheats, complaining that they were banned for nothing, defaming RHP.

I hope that what you were trying to say was not ¨disprove¨ but ¨discredit correct methods¨, because what you actually said was that it is better that miscarriages of justice continue than than a flawed system be brought to light. Which would be the case if Kepler were to disprove the methods used.

Potentially Kepler´s work could improve the methods used to detect cheats, making it more likely that the guilty are correctly banned and reducing the chance of a incorrect banning. I really do not understand the problem you have with this.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 22:33

1 edit

Originally posted by Palynka
[b]Now, if we are saying that a human cheating by using an engine gets a high match up rate irrespective of the engine he is using and the engine used to analyse his games then surely the match up rate between engines is also high?
Wrong. Again. You see, nobody said it was irrespective of the engine he is using.

I am testing for equality of means tandard? Really, your result are just all too obvious under a correct interpretation of them.

[/b]So you need to know the engine the suspect is using before the suspect's games are analysed? If you know what engine he is using you don't need any further investigation, just ban him!

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 22:37

Originally posted by Palynka
I am testing for equality of means between two samples which is why I used a two sample t-test, it was designed to do just that
I don't disagree that you're doing that. I'm telling you it is a poor test considering the question you wanted to answer. Which was (and I quote you here): if it is possible to distinguish between the play of engines and the play of humans..

Go on then, what do you suggest I do to determine if there is a significant difference in mean match up rate between engines and humans?

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08 22:42

Originally posted by Kepler
So you need to know the engine the suspect is using before the suspect's games are analysed? If you know what engine he is using you don't need any further investigation, just ban him![/b]

You don't need to know, genius, you find out after you get a high match-up rate with a particular engine.

Obviously people using BattleChess on a C64 or Chess Titans are less likely to be caught. And?

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 22:47

Originally posted by Palynka
I have to be able to show that it is anomalous in some way. Just saying I did the wrong test will not work since I performed exactly the test I should be performing to do what I wanted to do.
It is only anomalous because you are still interpreting the result as if the test had been the correct one, i.e. that it is a good way to distinguish between the p ...[text shortened]... a standard? Really, your result are just all too obvious under a correct interpretation of them.[/b]

So no significant difference between match up rates from humans and engines does not strike you as odd? In that case why are you complaining because that is what I found?

Have you ever considered why there are human vs human tournaments? Why the winners are not random, but consistent? Could it be that both types of tournament are run for the exact same reason, to determine the strongest player, whether human or engine, among those that enter.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 22:49

1 edit

Originally posted by Palynka
You don't need to know, genius, you find out after you get a high match-up rate with a particular engine.

Obviously people using BattleChess on a C64 or Chess Titans are less likely to be caught. And?

So those who use Fritz 8 as their sole analysis engine will only catch someone who is using Fritz 8? Have you told no1marauder this interesting idea?

More worrying, you have just suggested that if cheats want to evade detection they should use Battlechess. I sincerely hope that is not the case!

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08 22:50

1 edit

Originally posted by Kepler
Go on then, what do you suggest I do to determine if there is a significant difference in mean match up rate between engines and humans?

Seriously, can you read? The match-up rate across a group of engines is not relevant. If I test 100 matches with Fritz 10 against C64's Battlechess, I would get an average mean match-up rate with Fritz 10 somewhat higher than 50%. Would this surprise you? Really? Is it "anomalous"?

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08 22:51

Originally posted by Kepler
So those who use Fritz 8 as their sole analysis engine will only catch someone who is using Fritz 8? Have you told no1marauder this interesting idea?

Did you test Fritz 8 against Fritz 10? Do you want to bet on the results being like the ones in your "test"?

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08 22:52

Originally posted by Kepler
More worrying, you have just suggested that if cheats want to evade detection they should use Battlechess. I sincerely hope that is not the case!

Of course that's the case. But at least they won't win next year's Championship.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 22:56

Originally posted by Palynka
Seriously, can you read? The match-up rate across a group of engines is not relevant. If I test 100 matches with Fritz 10 against C64's Battlechess, I would get an average mean match-up rate with Fritz 10 somewhat higher than 50%. Would this surprise you? Really? Is it "anomalous"?

Obviously I can read. Just firing insults at me will not convince me of anything. Why is the match up rate across a group of engines not relevant? If it is not relevant why would the match up rate across a group of humans be relevant? If that also is not relevant why did Gatecrasher go to the trouble of obtaining exactly that data?

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 23:01

Originally posted by Palynka
Did you test Fritz 8 against Fritz 10? Do you want to bet on the results being like the ones in your "test"?

No. I did not use either version of Fritz but at least one person does use Fritz 8 to analyse sthe games of suspected engine users. Should something else be used instead? I am serious here, the advice that is usually trotted out is to analyse the suspect games using 30 seconds per move and recording the top three choices. No mention is made of which particular engine should be used. If you have some reason to believe that a particular engine or engines should be used or not used maybe you should make it available to those who do this work.

Kepler

Demon Duck

of Doom!

Joined: 20 Aug 06
Moves: 20099

01 Dec 08 23:02

Originally posted by Palynka
Of course that's the case. But at least they won't win next year's Championship.

I thought the object was to stop cheats no matter where. Does cheating not matter now provided it isn't done in the championship?

Palynka

Upward Spiral

Halfway

Joined: 02 Aug 04
Moves: 8702

01 Dec 08 23:06

Originally posted by Kepler
No. I did not use either version of Fritz but at least one person does use Fritz 8 to analyse sthe games of suspected engine users. Should something else be used instead? I am serious here, the advice that is usually trotted out is to analyse the suspect games using 30 seconds per move and recording the top three choices. No mention is made of which particula ...[text shortened]... engines should be used or not used maybe you should make it available to those who do this work.

I'm pretty sure they have much more information than me to choose which engines they test, with what time controls and how many choices. I'm fairly sure they are not as one-dimensional to believe there is one unique engine with one unique set of controls that is always best.

The reason they don't post those details here is clear, if you share my opinion. Because it would make it easier for cheaters to go around it.