Why we cannot trust Statistics on book Openings

~~Shamash~~

Only Chess

08 Apr 09

~~Shamash~~

AsIn Chess,SoIn Life

Joined: 22 Jan 09
Moves: 1153

08 Apr 09

Can we truly rely on the statistics of results in choosing an opening variation?

Because it is raining statistics.

It is raining opening databases.

It is raining opening lines and their results.

Along with the availability of statistics on opening systems and their results everywhere, remarks like this one at RHP have become commonplace wisdom:

"You will find a whole database of games played on this site. . .
The database tells us this has appeared ( X ) Number of games, and after the most popular choice ( Y ) , White has won ( Z ) % of these games."

===> Now here is how Alexander Morozevich, ranked #8 in the world at 2751, challenges this conventional wisdom (in his book on openings) :

"In looking at the opening stage, I should like to draw the readers' attention to the fact that on no account should they fall under the spell of the final result!

"Very often players avoid repeating perfectly promising variations, precisely because of the result of a game -- and this is absolutely the wrong approach.

"On the basis of statistics alone it is impossible to draw conclusions about the quality of an opening system -- since it often happens that a particular system is more often employed with one colour by stronger players."

========================================================

Thoughts?

*

KazetNagorra

Germany

Joined: 27 Oct 08
Moves: 3118

08 Apr 09

I think the percentages are valid for large numbers of games. It's safe to conclude that the Sicilian is indeed a better response to 1. e4 than 1... e5. But of course, if you are looking at single or only a handful of games, the percentages don't mean that much.

Meadows

Joined: 06 May 08
Moves: 1908

08 Apr 09

Good points. Another interesting (and slightly conflicting) reason is that a line could be played 99 times and thought favourable for white until one day someone well and truly refutes it. The statistics wouldn't reflect that either.

RECUVIC

YTM

Earth Milky Way

Joined: 23 Jan 06
Moves: 67934

08 Apr 09

solid advise! chessplayers who consistently avoid playing tried and tested book openings and play mere random based moves ,are clearly usually disadvantaging themselves from the start of game.However this is not to say that a percentage of such games 'out of book'cannot be won but over the longer term such a strategy is a poor one resulting in poorer results than is necessary.Part way through a book line divergence from all known book continuations clearly involves a degree of risk ,particularily against a well experienced player who is usually well aware of the most promising book moves to continue with,but even such experiences players will not have seen all possible moves out of a book line and are often blind to any slight or even major fault moves. This is why total reliance on only completed book lines is usually unwise,as it tends to obscure other unseen and previously unknown continuations?-----------😉

Mahout

London

Joined: 04 Nov 05
Moves: 12606

08 Apr 09

1 edit

Statistics have to be interpreted. For example a line could be shown as a high percentage of
wins for white, but when you drill down you find that was up until 1998 when people stopped
playing it due to the discovery of a refutation.

Also, the refutation of a variation at GM level may not mean much for club players where the system still has a lot of life.

Other factors include by who and when the games that created the results were played.

But I still think you can infer something from the raw stats...it gives you some indication.

Most importantly if a move is not shown in the early moves of a database line, then it's not
shown for a reason. This is a real indication of a weak move. But the big question is if the
move is so weak that I can see why it's weak and take advantage of the weakness.

Grampy Bobby

Boston Lad

USA

Joined: 14 Jul 07
Moves: 43012

08 Apr 09

Originally posted by Shamash
Can we truly rely on the statistics of results in choosing an opening variation?

Because it is raining statistics.

It is raining opening databases.

It is raining opening lines and their results.

Along with the availability of statistics on opening systems and their results everywhere, remarks like this one at RHP have become commonplace w ...[text shortened]...
========================================================

Thoughts?

*

Insightful and provocative thread topic, Shamash. Few questions: (1) What thrill is there in being wed to the historical consensus of other

players' moves. (2) What possible satisfaction derives from a favorable outcome if someone else's signature squats alongside your own?

(3) Why choose not to recognize the values of 'getting out of book' early and often... to present your opponents with possible dimensions

of unfamiliarity; to tempt them to become strategically impatient and tactically premature; to underestimate positional force not yet fully

deployed. Last, allow me to personalize an older gent's viewpoint of the topic. Bottom line issues, now, are simply the joy of still playing

conversational chess and doing so like the Vermont Farmer, who took his milk from many cows over many years but made his own butter.

-gb

exigentsky

Joined: 19 Nov 05
Moves: 3112

08 Apr 09

2 edits

Originally posted by KazetNagorra
I think the percentages are valid for large numbers of games. It's safe to conclude that the Sicilian is indeed a better response to 1. e4 than 1... e5. But of course, if you are looking at single or only a handful of games, the percentages don't mean that much.

No, it absolutely isn't and in fact most of the Super-GMs are now playing e5 much more frequently than c5. Unless the statistics are based on "perfect chess," a modest difference in performance is not relevant and revealing. After all, only one tiny error along a series of many moves will result worse position than theoretically possible. That one error may make c5 look better than e5 or vice versa. However, even if this were not the case, some positions that are actually equal may be harder to play than others - explaining poor results. For instance, the Marshall Gambit may not be stronger than the traditional Ruy Lopez but it makes White players sweat more. Statistics are no more than a rough guide and we are still far from chess truth.

BTW: Even when the "best" line is agreed upon, people have often been wrong. Although, even if they aren't that doesn't always mean it is the best practical choice for you.

smrex13

Joined: 01 Aug 04
Moves: 3215

08 Apr 09

1 edit

Originally posted by exigentsky
No, it absolutely isn't and in fact most of the Super-GMs are now playing e5 much more frequently than c5. Unless the statistics are based on "perfect chess," a modest difference in performance is not relevant and revealing. After all, only one tiny error along a series of many moves will result worse position than theoretically possible. That one error m gh, even if they aren't that doesn't always mean it is the best practical choice for you.

Exigentsky,

You said most of what I was thinking. I was going to add that simply looking at "1...e5" and "the Sicilian" is misleading as well. Plenty of White players avoid the main line Sicilians and play theoretically inferior moves, thus making Black's score higher. Furthermore, the strength of the players must be taken into account in any statistical analysis. And finally, we must go further into the opening variations to really determine anything - the statistics of 1...e5 vs. 1...c5 don't tell us much. However, the statistics of strong GMs deep into the main line Lopez vs. the English Attack Najdorf might be interesting (and ultimately irrelevant for 99% of all chess players due to their inability to correctly handle the position).

Just my 2 cents,
Scott

Traveling Again

I'm 1/4 Ninja

Joined: 02 Dec 08
Moves: 27516

08 Apr 09

Originally posted by Shamash
Can we truly rely on the statistics of results in choosing an opening variation?

Because it is raining statistics.

It is raining opening databases.

It is raining opening lines and their results.

Along with the availability of statistics on opening systems and their results everywhere, remarks like this one at RHP have become commonplace w ...[text shortened]...
========================================================

Thoughts?

*

Sure we can rely on and trust statistics. Absolutely. But we have to know what statistics are
telling us. We can only trust them and rely on them for what they are, not what they aren't.
Statistics are historical data -- not a crystal ball that will predict the future.

Chess opening statistics won't tell us if we'll win a game by making a certain move, but they can
tell us the trends of such a move made in previous games and we could use that in making a judgment about a future move.

I think that's what Morozevich was saying in his quote. Too often people
rely on statistics for information that statistics are unable to provide.

Black Star Uchess

Joined: 09 Mar 09
Moves: 27

08 Apr 09

I have noticed that stats often reflect preparation rather then the strength of a line... e.g Halloween attack has a winning average according to chessgames but is definitely flawed. on chessbase the Jerome Gambit ?? has a winning average ... this is taken from lower ranking players....

I think stats have there merits but the needs to be enough data on the particular line and as someone else pointed out you get very different results looking at GM games vs normal games.

Wulebgr

Angler

River City

Joined: 08 Dec 04
Moves: 16907

09 Apr 09

The Fried Liver Attack is unsound, yet White scores 70% in the Chessbase database.

Grampy Bobby

Boston Lad

USA

Joined: 14 Jul 07
Moves: 43012

09 Apr 09

1 edit

Originally posted by Grampy Bobby
Insightful and provocative thread topic, Shamash. Few questions: (1) What thrill is there in being wed to the historical consensus of other

players' moves. (2) What possible satisfaction derives from a favorable outcome if someone else's signature squats alongside your own?

(3) Why choose not to recognize the values of 'getting out of book' early armer, who took his milk from many cows over many years but made his own butter.

-gb

Postscript: Realize I'm way out of step with the popular statistical assistance drumbeat of this thread. However I've learned whenever

in doubt to apply sound principle. Here goes. Big government, small people. Small government, enterprising folks making discoveries

and fulfilling destinies. Omniscient statistical chess data bases, cookie cutter chess players moving in unison. Zero data bases, chess

giants and heroes whose names we speak with reverence. It's a game. Why not discover it naked? Why suck the hind tit of inquiries?

Melanerpes

Joined: 08 Oct 08
Moves: 5542

10 Apr 09

3 edits

a lot depends on how much knowledge and experience you and your opponent have with a particular opening.

if your opponent is a lot more experienced and knowledgeable in playing a certain line than you are, you will probably lose if you play that line - even if it's clearly the "best" or "soundest" approach. You may have a much better chance if you were to play a relatively "weaker" line that your opponent doesn't know much about.

the database stats don't tell you anything about the particular person you're playing. I'm sure that GMs do a lot a research into the games the other GMs play and know what openings and styles they've been most successful with or have the most experience with.

RECUVIC

YTM

Earth Milky Way

Joined: 23 Jan 06
Moves: 67934

10 Apr 09

Sadly in chess as with all things in life there is some human fascination with the word 'statistics'and its multiple meanings.Despite many differing opinions'statistics' is quite literally the interpretation and manipulation of numbers and other subject matters ,in order to produce desired or desirable results,in the mistaken belief that these 'statistics' should or can be correctly used as the basis of logical decision making.Nothing has yet been quite so far from the truth as this totally flawed image. Statistics reveal some undeniable facts ,however they also conceal some equally relevant facts which so very often make a nonsense of the originators statistical analysis.Correctly gathered information and the consequently properly considered analysis can never be complete on any subject unless all relevant facts are used as part of this analysis,and as this is not possible on any subject matter ,then it is not unreasonable that 'statistics' should be treated with the greatest caution and not believed to be the whole truth. The whole truth is the only truth worth obtaining ,and this cannot be obtained by statistics alone. Statistics are therefore an unfinished painting not representative of the whole picture,and are open to many personal interpretations. They are at best a rough guide on any subject?------------😴

thesonofsaul

King of the Ashes

Trying to rise ....

Joined: 16 Jun 04
Moves: 63851

10 Apr 09

Originally posted by Wulebgr
The Fried Liver Attack is unsound, yet White scores 70% in the Chessbase database.

Don't know if it is unsound. Risky, certainly, but if played right white can achieve an acceptable position. Also, in timed otb games it can create mass confusion which is often what you want (tick tock tick tock tick tock ....)