Originally posted by VarenkaI remember the figure of 60% (or thereabouts) for first choice match up being bandied about as some kind of threshold for engine use. Of course, if you are matching top two or three rather than first choice you already have some variation in the moves played. My engine plays 1. Nf3, yours wants to play 1. e4 but your engine's second or third choice is 1. Nf3 gives you a match although our engines actually want to play completely different games.
You're avoiding the question.
You try to claim that engine analysis varies greatly, but yet it is fundamental that the same analysis can be reproduced as part of the "top 3 matchup" method. i.e. if a cheat uses an engine to suggest Nf3, then I need to get my engine to do likewise in order to gather evidence (sure, not on every move, but on a high enough ...[text shortened]... o greatly for another person to also produce the same analysis. This contradicts the above.
I actually tried this as an experiment a while ago. I played a match between two engines without opening books. Although the games tended to start out with the same moves (both engines had a rather dull liking for the Petroff) no pair of games matched after relatively few moves. That 40% first choice non-match was quite sufficient to lead the games down differing paths.
Originally posted by DiophantusHave you ever used the "Deep Position Analysis" feature of the Fritz GUI, or the similar feature in the Aquarium GUI? If you have, you'll know that the branching factor can be set to be more than 1, e.g. 3 or 5. I can also tell it to repeat the process using several different engines, but combining all the analysis. It can then be left to run automatically for days.
My engine plays 1. Nf3, yours wants to play 1. e4 but your engine's second or third choice is 1. Nf3
When I played with computer assistance at another site, how often do you think my opponents' played a move that I didn't have in my analysis tree? Very seldom. Sure, they didn't always play the top choice, but it was often between a small choice, especially in sharp openings with no slow manuevering going on.
I've also seen in other forums players exchanging files of analysis, in which case you can build up weeks/months of analysis.
Originally posted by VarenkaI agree with your comments 100% and it has worded my concerns much more succinctly. It appears we are using a rather blunt instrument for a delicate task.
The method is a crude method that I think is capable of identifying blatant cheaters, but no more than that. If someone is obtaining significantly high matchups over a big enough sample, then I do regard that as being very suspicious.
But borderline cases are indeed more debatable, especially when we consider how the borderline was derived. The bottom l ...[text shortened]... doesn't make the method completely redundant, but it does complicate where the borderline is.
However, rather than say "I do regard that as being very suspicious" after an x% match-up, I would prefer we were saying something like "there is a y% (90-99%?) probability this person cheated in the games they won against opponents with a rating exceeding z (e.g. 1400)".
Originally posted by wormwoodThanks for the comments. Understood re the large match-up margins which removes the element of doubt.
that's all been done. to death, I might add. some of the issues you bring up (as dozens or hundreds have before) are non issues, most are testable with control groups (the current masters differ from old ones much less than you'd think, statistically), and the math you can always crunch through yourself when in doubt.
couple of practical points: firstly ...[text shortened]... bout how it all works. (in the case someone doesn't do it all over here. I won't bother.)
When I refer to a statistical methodology I'm talking about statistical confidence levels, sample sizes, sample selection etc. Something that you might study in a statistics paperThanks for the comments - although I think you misunderstood any inferences attributed to my post based on the 'eyebrow raising move' comment - to suggest a banning on one move ignores the rest of the post. Further, you failed to quote one of my final sentences which did not advocate discarding the entire system currently used, but never mind.
In every case I have seen posted by the likes of Squelchbelch or Zygalski on this forum, Processor speed, databases details and engine have been clearly stated.
Every time I read a post like yours I worry ...[text shortened]... can operate
[snip]
All you need is the ceiling for what match up rate is "impossible."[/b]
When I read comments like the one I have quoted from your post I hope the powers that be actually understand statistics because I believe a number of posters on this subject don't. Why? First up things like the processor speed has absolutely nothing to do with statistics - that and the engine of choice are just the tools being used. Secondly, by using statistics properly and instead of having an 'impossible ceiling' you could actually lower the bar. We might catch more cheaters if statistics was used correctly to set a bar at a certain confidence level, and we also understood the risk of a false positive (although I accept that if we set the bar very high which is possibly the case at the moment, then the probability of a false positive is extremely low, but we aren't catching those with slightly lower match up rates).
Don't get me wrong : I am 100% for weeding out the cheats, but I would like to see it done properly (i.e. using a sound statistical base).
Andrew
[Side issue : From another post it has been indicated that forced moves make little difference so this is probably a moot point - but on this subject I would have thought that only having one square for the King to move to from a check position (and no other legal moves) would be very easy to determine, as is the case of the one and only move to avoid a threatened checkmate. So there is no prejudging required at all, and I'm sure there are other examples. However, given the number of candidate moves in any one game would be very low that is why it probably has little impact]
The three match up system is not the only method in use.
This weeds out the schoolboys. With known good players or stealth users
there are two other methods that are brought into play.
(well there is a third actually, a PM to a non-profile non-poster asking
who they are, this often results in a player suddenly quitting the site).
The other two methods are more accurate though lengthy (especially
in the case of a good player) and not open for discussion.
In the three match up system it's end of opening database + 5 to cover
players own research. It may be longer than that as you have to see if
that player has had that position before and then start a match up run from
the deviation + 5.
So just grabbing 20 odd games and running them through a box is
only half of it. You need access to everything he has played else you
could be matching up to someting he played two years ago, went wrong.
boxed it, found the correct path and stored it for a later day.
There is nothing wrong in that.
Leave it to the guys who know what they are doing and have
all the tools and information at their disposal.
They have a good track record.
Originally posted by andrew93The top three (or whatever number, some use four) match up is used by those wishing to detect cheats. That doesn't mean they have the power to ban or even that they are 100% correct, although some act as if they are. Here one does the match up analysis then sends a fair play ticket. The team of game mods then use their own methods, probably including a more rigorous version of the top three match up analysis, to decide whether or not to recommend banning a player. Then the site admins get to decide whether or not they want to ban the offending player. A long, drawn out process with no absolutely certain outcome.
I agree with your comments 100% and it has worded my concerns much more succinctly. It appears we are using a rather blunt instrument for a delicate task.
However, rather than say "I do regard that as being very suspicious" after an x% match-up, I would prefer we were saying something like "there is a y% (90-99%?) probability this person cheated in the games they won against opponents with a rating exceeding z (e.g. 1400)".
I take comfort in the fact that if you hang around long enough you will see complaints that "player x has not been banned even though I sent a fair play ticket n months ago". This site does not ban on the say so of amateur statisticians. Of course, the methods used by the game mods may well produce statements like "there is a y% (90-99%?) probability this person cheated in the games they won against opponents with a rating exceeding z (e.g. 1400)". Unfortunately we don't get to see that.
Originally posted by VarenkaI've used the Aquarium thing, IDeA, but it doesn't work at all like the Deep Position Analysis in Fritz. You can't avoid getting the same analysis in that as Aquarium stores all analysis in a tree somewhere and refuses to re-analyse any position it has looked at before. On the other hand, it is more flexible than the Fritz version because depth and branching are modified on the fly according to the way the engine scores the position currently being worked on and where the engine is currently looking in the tree. I definitely don't get the same results using the two different methods even if using same engine on same machine!
Have you ever used the "Deep Position Analysis" feature of the Fritz GUI, or the similar feature in the Aquarium GUI? If you have, you'll know that the branching factor can be set to be more than 1, e.g. 3 or 5. I can also tell it to repeat the process using several different engines, but combining all the analysis. It can then be left to run automaticall ...[text shortened]... rs exchanging files of analysis, in which case you can build up weeks/months of analysis.
You could run this sort of thing for days and likely get the same analysis time after time, especially using Aquarium for the reason mentioned above. However, your complaint was that people are not producing an exact match when doing top three match up. This is hardly the same thing as one is not using either of the deep position analysis tools and not running the engine for days. That is likely why my engine match produced divergence, I did not give my two engines a day or more per move. I gave them a generous 40 minutes for 40 moves.
Originally posted by andrew93I think that forced moves would have an effect at the single game level but over several games it should average out. If forced moves were ignored when the thresholds were established then the effect should be to establish a slightly higher threshold for match ups than would have been the case if forced moves were weeded out. This gives greater leeway for the accused than would otherwise be the case. And one must always have in mind the simple fact that the armchair statisticians are not those who determine guilt or innocence here, they are just using this match up thing to decide whether or not to send the fair play ticket.
[Side issue : From another post it has been indicated that forced moves make little difference so this is probably a moot point - but on this subject I would have thought that only having one square for the King to move to from a check position (and no other legal moves) would be very easy to determine, as is the case of the one and only move to avoid a thr ...[text shortened]... of candidate moves in any one game would be very low that is why it probably has little impact]
Originally posted by Diophantuswhat about if someone is fairly average and then becomes awesome? what if they
I think that forced moves would have an effect at the single game level but over several games it should average out. If forced moves were ignored when the thresholds were established then the effect should be to establish a slightly higher threshold for match ups than would have been the case if forced moves were weeded out. This gives greater leeway for ...[text shortened]... , they are just using this match up thing to decide whether or not to send the fair play ticket.
were simply rusty after some tears out of chess and then regain their form, or could
this be construed as an indication of underhanded and fiendish play.
Originally posted by DiophantusI believe that the top three matchup can produce high enough levels of matchups for blatant cheats. And this is without knowing the exact software/hardware that the player used.
However, your complaint was that people are not producing an exact match when doing top three match up.
Given this, I see no reason why two players with good intentions couldn't produce opening analysis, as "pre-existing research", but then play a game together with high matchups as a result. This opening analysis would be a broad range of lines, automatically generated over days, with maybe 30 to 60 seconds per move.
I'm in no way making excuses for cheaters - I hate them as much as anyone else. I just don't view things as being as simple as others make it out to be.
Originally posted by greenpawn34Intriguing post, GP. I wish we could discuss the other 2 methods, but I understand the need for secrecy.
The three match up system is not the only method in use.
This weeds out the schoolboys. With known good players or stealth users
there are two other methods that are brought into play.
(well there is a third actually, a PM to a non-profile non-poster asking
who they are, this often results in a player suddenly quitting the site).
The other tw ...[text shortened]... have
[b]all the tools and information at their disposal.
They have a good track record.[/b]
Yeah, it's easy to spot the schoolboys, but the stealth user seems to be a much tougher nut to crack. Someone dials in 2300 or so in the "limit strength" setting of his engine, and soon he's near the top of the player list. (Or even dial in a lower rating if your cheating goals are more modest.) Match-up rates with this "crippled" engine are now in the normal range for that rating and are now useless for detection purposes. Other methods must be used to determine whether the moves are human or not. (Can this even be done?) I'm not even sure you could use blunder rates as a method, as these engines' blunder rates seem to not be all that different than human blunder rates.
If I were really concerned about my rating here, I think I'd be really depressed thinking about this stuff. Fortunately, that's not the case.
Originally posted by Mad RookThis. My philosophy is resign early and move on.
If I were really concerned about my rating here, I think I'd be really depressed thinking about this stuff. Fortunately, that's not the case.
Rating means very little here, most of the players I respect here are rated about 1800, have an otb background and every game they have new ideas that I haven't seen before.
"So just grabbing 20 odd games and running them through a box is
only half of it. You need access to everything he has played else you
could be matching up to someting he played two years ago, went wrong, boxed it, found the correct path and stored it for a later day.
There is nothing wrong in that. "
The italics were added by me, because that's the part that really caught my eye.
Nimzo5 and I had a PM conversation going, and I mentioned to him that I have a pretty substantial amount of "home prep" on Alekhine's Defense that I use OTB, but I have avoided using it here because I used a computer for some of it, and I didn't want to cheat ( the Voronezh variation in particular, where I used the games of NZ IM John Russell Dive and a heapin' helpin' side dish of Fritz and Rybka.)
As a result, I have only used Alekhine's Defense here sparingly, to avoid getting caught out having to use inferior choices to avoid cheating.
Nimzo5 replied to me that it is OK to use computer-generated analysis if it was generated prior to the start of the game, not during the game as work on the actual game in progress.
It sounds like your post confirms to me that this is OK.
Of course, anyone who is using an opening book published in the last 10 years runs the risk of using computer analysis purely by literary osmosis, and that has been a concern to me, as my library is huge, but now I see the consistency and I will worry less.
My thinking had been gravitating in this direction for some time (it would seem funny to not allow someone to use their computer-assisted analysis of their losses in future games), but you guys have really clarified the issue.
Paul