Originally posted by YugaI got a match up rate for first choice moves of approximately 61% for the humans playing 1922 and approximately 60% for the engines in 2008. I would not want to put too much faith in those figures. Figures may not lie but some will not stand up either.
The strongest human players can hit 60%+ 1st choice OTB according to Gatecrasher's statistics. So I think can assume that it is possible for the strongest human correspondence players to do so as well.
I think I recall that only Rittner in that CC tournament in the 60's hit above 60% first choice, but CC and chess theory has greatly improved in the last 40 ...[text shortened]... my statement.
I don't know Kepler's statistics, only his general methodology and results.
Originally posted by KeplerNow, if we are saying that a human cheating by using an engine gets a high match up rate irrespective of the engine he is using and the engine used to analyse his games then surely the match up rate between engines is also high?
It is indeed obvious that an engine will get a high match with itself. No debate there, that is exactly the reason I did not use HIARCS for my own analysis. Now, if we are saying that a human cheating by using an engine gets a high match up rate irrespective of the engine he is using and the engine used to analyse his games then surely the match up rate betwe ...[text shortened]... ll not work since I performed exactly the test I should be performing to do what I wanted to do.
Wrong. Again. You see, nobody said it was irrespective of the engine he is using.
I am testing for equality of means between two samples which is why I used a two sample t-test, it was designed to do just that
I don't disagree that you're doing that. I'm telling you it is a poor test considering the question you wanted to answer. Which was (and I quote you here): if it is possible to distinguish between the play of engines and the play of humans..
I have to be able to show that it is anomalous in some way. Just saying I did the wrong test will not work since I performed exactly the test I should be performing to do what I wanted to do.
It is only anomalous because you are still interpreting the result as if the test had been the correct one, i.e. that it is a good way to distinguish between the play of engines and the play of humans. Did you ever consider why there are even engine vs engine tournaments ? Why the winners are not random, but consistent? What type of match-up rates would you get if you used Battlechess as a standard? Really, your result are just all too obvious under a correct interpretation of them.
Originally posted by KorchI hope that what you were trying to say was not ¨disprove¨ but ¨discredit correct methods¨, because what you actually said was that it is better that miscarriages of justice continue than than a flawed system be brought to light. Which would be the case if Kepler were to disprove the methods used.
Disproving methods (which were used to detect many obvious and not so obvious cheats) is advantageous for banned cheats, complaining that they were banned for nothing, defaming RHP.
Potentially Kepler´s work could improve the methods used to detect cheats, making it more likely that the guilty are correctly banned and reducing the chance of a incorrect banning. I really do not understand the problem you have with this.
Originally posted by Palynka[/b]So you need to know the engine the suspect is using before the suspect's games are analysed? If you know what engine he is using you don't need any further investigation, just ban him!
[b]Now, if we are saying that a human cheating by using an engine gets a high match up rate irrespective of the engine he is using and the engine used to analyse his games then surely the match up rate between engines is also high?
Wrong. Again. You see, nobody said it was irrespective of the engine he is using.
I am testing for equality of means tandard? Really, your result are just all too obvious under a correct interpretation of them.
Originally posted by PalynkaGo on then, what do you suggest I do to determine if there is a significant difference in mean match up rate between engines and humans?
I am testing for equality of means between two samples which is why I used a two sample t-test, it was designed to do just that
I don't disagree that you're doing that. I'm telling you it is a poor test considering the question you wanted to answer. Which was (and I quote you here): if it is possible to distinguish between the play of engines and the play of humans..
Originally posted by KeplerYou don't need to know, genius, you find out after you get a high match-up rate with a particular engine.
So you need to know the engine the suspect is using before the suspect's games are analysed? If you know what engine he is using you don't need any further investigation, just ban him![/b]
Obviously people using BattleChess on a C64 or Chess Titans are less likely to be caught. And?
Originally posted by PalynkaSo no significant difference between match up rates from humans and engines does not strike you as odd? In that case why are you complaining because that is what I found?
I have to be able to show that it is anomalous in some way. Just saying I did the wrong test will not work since I performed exactly the test I should be performing to do what I wanted to do.
It is only anomalous because you are still interpreting the result as if the test had been the correct one, i.e. that it is a good way to distinguish between the p ...[text shortened]... a standard? Really, your result are just all too obvious under a correct interpretation of them.[/b]
Have you ever considered why there are human vs human tournaments? Why the winners are not random, but consistent? Could it be that both types of tournament are run for the exact same reason, to determine the strongest player, whether human or engine, among those that enter.
Originally posted by PalynkaSo those who use Fritz 8 as their sole analysis engine will only catch someone who is using Fritz 8? Have you told no1marauder this interesting idea?
You don't need to know, genius, you find out after you get a high match-up rate with a particular engine.
Obviously people using BattleChess on a C64 or Chess Titans are less likely to be caught. And?
More worrying, you have just suggested that if cheats want to evade detection they should use Battlechess. I sincerely hope that is not the case!
Originally posted by KeplerSeriously, can you read? The match-up rate across a group of engines is not relevant. If I test 100 matches with Fritz 10 against C64's Battlechess, I would get an average mean match-up rate with Fritz 10 somewhat higher than 50%. Would this surprise you? Really? Is it "anomalous"?
Go on then, what do you suggest I do to determine if there is a significant difference in mean match up rate between engines and humans?
Originally posted by PalynkaObviously I can read. Just firing insults at me will not convince me of anything. Why is the match up rate across a group of engines not relevant? If it is not relevant why would the match up rate across a group of humans be relevant? If that also is not relevant why did Gatecrasher go to the trouble of obtaining exactly that data?
Seriously, can you read? The match-up rate across a group of engines is not relevant. If I test 100 matches with Fritz 10 against C64's Battlechess, I would get an average mean match-up rate with Fritz 10 somewhat higher than 50%. Would this surprise you? Really? Is it "anomalous"?
Originally posted by PalynkaNo. I did not use either version of Fritz but at least one person does use Fritz 8 to analyse sthe games of suspected engine users. Should something else be used instead? I am serious here, the advice that is usually trotted out is to analyse the suspect games using 30 seconds per move and recording the top three choices. No mention is made of which particular engine should be used. If you have some reason to believe that a particular engine or engines should be used or not used maybe you should make it available to those who do this work.
Did you test Fritz 8 against Fritz 10? Do you want to bet on the results being like the ones in your "test"?
Originally posted by KeplerI'm pretty sure they have much more information than me to choose which engines they test, with what time controls and how many choices. I'm fairly sure they are not as one-dimensional to believe there is one unique engine with one unique set of controls that is always best.
No. I did not use either version of Fritz but at least one person does use Fritz 8 to analyse sthe games of suspected engine users. Should something else be used instead? I am serious here, the advice that is usually trotted out is to analyse the suspect games using 30 seconds per move and recording the top three choices. No mention is made of which particula ...[text shortened]... engines should be used or not used maybe you should make it available to those who do this work.
The reason they don't post those details here is clear, if you share my opinion. Because it would make it easier for cheaters to go around it.