Streaks

Nybes

Posers and Puzzles

20 Jan '10 03:10

Nybes
Joined
29 Oct '06
Moves
10033
20 Jan '10 03:10
I was looking through my stats on another online chess site this morning. I've played 136 games in total, of which I have won 81.

I was suprised by another stat on the page - my longest winning streak. 14 consequtive wins. This struck me as being well outside the norm, considering my overall win ratio is just ~60%.

Obviously, I'm human, and my wins are not distributed chronologically randomly. I have weeks in which I play great chess, and others in which I'm decidedly off. The length of my streaks - both winning and losing - is likely to longer than those that might be expected of a purely random distribution.

However, I think it raises an interesting problem.

Let's say Bob Jones plays x games of chess. Of these x games, he wins w games.

Assuming that his victories are distributed randomly throughout the chronological sequence of games, what is the most likely length of his longest winning streak?
TheMaster37
Kupikupopo!
Out of my mind
Joined
25 Oct '02
Moves
20443
20 Jan '10 16:51
Originally posted by Nybes
I was looking through my stats on another online chess site this morning. I've played 136 games in total, of which I have won 81.

I was suprised by another stat on the page - my longest winning streak. 14 consequtive wins. This struck me as being well outside the norm, considering my overall win ratio is just ~60%.

Obviously, I'm human, and my wins are ...[text shortened]... ronological sequence of games, what is the most likely length of his longest winning streak?
With win-chance 0,5 and 200 games you'd expect a series of 8 consecutive wins.

I've been googling for a formula but can only find crude boundaries.
Aetherael
Joined
02 Mar '06
Moves
17881
20 Jan '10 17:181 edit
Originally posted by TheMaster37
With win-chance 0,5 and 200 games you'd expect a series of 8 consecutive wins.

I've been googling for a formula but can only find crude boundaries.
if the probability of winning is 1/2, then one can figure out the expected series of wins (at least a very good approximation) by assuming a streak of n wins takes place within 2^n games. generally, for probability p, i think a streak of n wins should take place within (1/p)^n games.

e.g. since 200 lies between 128 and 256 (2^7 and 2^8), we expect a streak of 7 or 8 consecutive wins to occur... and since 200 is closer to 256 than 128, it's more like 7.6? in fact i think i just realized you could use logs to compute the actual expected streak, provided you know the probability of winning "p." in this particular case, the expected streak would actually be log(base 2) of 200 = approximately 7.643856

EDIT:
Let's say Bob Jones plays x games of chess. Of these x games, he wins w games.

Assuming that his victories are distributed randomly throughout the chronological sequence of games, what is the most likely length of his longest winning streak?

assuming these games and wins are indicative of Bob's probability of winning, we call it (w/x). but then the question is what is the relationship between (w/x) and x itself? i think for x games, he should win "log(base [x/w]) of x" times in a row. this may be entirely wrong, but notice as w->x, (i.e. his win percent approaches 100) his expected # of wins = x. and on the other hand as w->0, (x/w) becomes a bigger and bigger number, and thus the log of x decreases to become < 1
Palynka
Upward Spiral
Halfway
Joined
02 Aug '04
Moves
8702
20 Jan '10 17:545 edits
Originally posted by Aetherael
if the probability of winning is 1/2, then one can figure out the expected series of wins (at least a very good approximation) by assuming a streak of n wins takes place within 2^n games. generally, for probability p, i think a streak of n wins should take place within (1/p)^n games.

e.g. since 200 lies between 128 and 256 (2^7 and 2^8), we expect a s , (x/w) becomes a bigger and bigger number, and thus the log of x decreases to become < 1
I thought of this (using a log of base 1/p), but I thought of at least 2 important problems to calculate a precise answer:

1. We're asked to find the longest expected string so we need to calculate probabilities conditional on no longer one existing.

2. (This is related to your edit): There's a problem in fitting the wins outside the longest streak because the number of wins is fixed. We can't simply assume a constant win probability, because then we have no guaranteed that we have exactly w wins. Of course, the larger n, the least important is this issue but again, if we want an exact solution, I think better approach will be one more based on combinatorics: i.e. finding combinations for the placement of each possible longest streak and non-streak games. It's easy to write a computer program to find this out (mode longest streak=7 in the original example), but I'm having trouble finding a closed form solution.

One interesting issue is that we need to consider that the longest streak creates a "tile" of size l+2 (where l is the length of the streak and there is a loss on either side) when occurs in the middle of the sample, but it creates a "tile" of size l+1 if it's in the beginning or end of the sample (one loss after or before the streak, respectively).

Interesting problem, much harder than it looks. 🙂
randolph
the walrus
an English garden
Joined
15 Jan '08
Moves
32836
21 Jan '10 02:31
For me, at least, I subconsciously drag out my losing games and tend to have wins grouped together followed by losses grouped together.
Nybes
Joined
29 Oct '06
Moves
10033
21 Jan '10 03:32
I should have added to my original post - my own meagre attempts to solve the problem have come to naught. I suspect I simply don't possess the mathematical tools neccesary. I do think that an elegant solution will exist, however.

For what it's worth, here's a wee table I drew up of some trivial results for x, w < 7. w along the top, x down the side.

a/b denotes that a longest streak of length a or b are equally likely.

_________1________2________3________4________5________6
1 |..........1
2 |..........1............2
3 |..........1........... 2............3
4 |..........1...........1/2..........2/3..........4
5 |..........1............1.............2..........3/4...........5
6 |..........1...........1..............2..........2/3..........3/4/5.........6

The very process of using brute force to draw up the above table leads me to believe that combinatorics will be the most promising avenue to explore.

However, I'm afaid I'll have to leave the details to the more mathematically gifted.
Nybes
Joined
29 Oct '06
Moves
10033
21 Jan '10 06:56
I knocked up a program in QBasic - my apologies, my programming aspirations died a quick death back in the early 90's.

It generates a sequence of x games including w wins and then analyses the sequence to pick out the longest streak of wins. It runs 100,000 iterations and then tabulates the results.

Here are the results from running the program using the example in the original post - x=136, w=81.

Longest streak - number of occurences

4 wins - 112
5 wins - 3780
6 wins - 14856
7 wins - 22417
8 wins - 20786
9 wins - 15230
10 wins - 9541
11 wins - 5878
12 wins - 3280
13 wins - 1920
14 wins - 1007
15 wins - 540
16 wins - 312
17 wins - 144
18 wins - 69
19 wins - 51
20 wins - 26
21 wins - 30
22 wins - 11
23 wins - 4
24 wins - 3
25 wins - 0
26 wins - 0
27 wins - 0
28 wins - 3

This obviously doesn't get us any closer to general solution, but it does, I think establish that 7 is the mode for the longest winning streak for this specific example.

I'm suprised to see how long the 'tail' of the distribution is, extending upwards. I thought my streak of 14 consequtive wins would constitute a freak occurence, but the table above shows it's frequency as a little over 1%.
Palynka
Upward Spiral
Halfway
Joined
02 Aug '04
Moves
8702
22 Jan '10 15:361 edit
Ok, I can't find a simple way to constrain the combinations of non-streak elements so that they don't contain any bigger sequence. As a preliminary result (that might help somebody) I've found that it's easy enough to calculate the amount of possible combinations when the sequence is larger than half the total amount of wins.

The formula is:

2 C(n-k-1,w-k) + (n-k-1) C(n-k-2,w-k)

where C(x,y) is the number of y-combinations from a set of x elements.

The explanation is that if you have a streak of size k, then you have two possibilities:
1) The second term: Your streak is not at one of the extremes and so you have to append one loss on each side, leaving you with n-k-2 elements where you can distribute your remaining w-k wins. You have n-k-1 places where this streak can be located so you multiply by that number

2) The first term: Your sequence is in the beginning or end (multiply by 2) and you need to append one loss after or before the sequence (respectively). So you're left with w-k wins to distribute on n-k-1 places.

This formula seemed to work for the few simple examples I've tested it on. For example, with w=3 and n=5 the formula provides the following possibilites (9,6,3) for sequences of size 1,2 and 3 respectively. The true solution should be (1,6,3) because you can only fit 3 wins with a max sequence of size 1 by having WLWLW.

The problem is that it only seems to work for streaks where no other of larger or equal size can exist. So for larger n, they are only valid for a very unlikely set of streaks... For those that are smaller than w/2 we need to eliminate larger ones and I've found it to be hard to come up with a simple formula for it. Perhaps using recurrence but I give up...
😞
TheMaster37
Kupikupopo!
Out of my mind
Joined
25 Oct '02
Moves
20443
25 Jan '10 12:48
Originally posted by Palynka
Ok, I can't find a simple way to constrain the combinations of non-streak elements so that they don't contain any bigger sequence. As a preliminary result (that might help somebody) I've found that it's easy enough to calculate the amount of possible combinations when the sequence is larger than half the total amount of wins.

The formula is:

2 C(n-k-1 ...[text shortened]... d to come up with a simple formula for it. Perhaps using recurrence but I give up...
😞
Don't worry, the result I found on internet give very crude upper and lower bounds containing all kinds of variables and logarithms.

They appear to be from people thesis's or post-doc's.
Palynka
Upward Spiral
Halfway
Joined
02 Aug '04
Moves
8702
27 Jan '10 14:33
Originally posted by TheMaster37
Don't worry, the result I found on internet give very crude upper and lower bounds containing all kinds of variables and logarithms.

They appear to be from people thesis's or post-doc's.
Don't be defeatist. 😛

How can the formula I gave above be adapted to remove the possibilities with larger sequences when streak<w/2? I immediately think of making heavy-use of recursion, but there might be simpler ways...
luskin
Joined
14 Dec '05
Moves
5694
06 Feb '10 07:31
Originally posted by Nybes
[b]I was looking through my stats on another online chess site this morning. I've played 136 games in total, of which I have won 81.

I was suprised by another stat on the page - my longest winning streak. 14 consequtive wins. This struck me as being well outside the norm, considering my overall win ratio is just ~60%.
Here are two programs which both give ~3.5% as the answer to your original question...ie no. of games 136; streak length 14; prob 81/136

http://www.bumblebeagle.org/horsehide/hitstreaks.html

http://www.beatingbonuses.com/calc_streak.htm
luskin
Joined
14 Dec '05
Moves
5694
06 Feb '10 08:44
Apparently 3.5% is the probability of getting a run of 14 or more. For exactly 14 it would be somewhat less of course.
Aetherael
Joined
02 Mar '06
Moves
17881
06 Feb '10 08:543 edits
Originally posted by luskin
Here are two programs which both give ~3.5% as the answer to your original question...ie no. of games 136; streak length 14; prob 81/136

http://www.bumblebeagle.org/horsehide/hitstreaks.html

http://www.beatingbonuses.com/calc_streak.htm
"This java applet computes the chance that a player will achieve a certain hitting streak at some point during the season. The model assumes that the chance that the player gets a hit in any game is unchanging and independent of other games."

the issue with this program (and i assume the other program if it computes the same results, though its computation algorithm at first glance is hidden), as opposed to the problem palynka is attempting to solve, is that it assumes you have a static "percent chance to win." this is not really in the spirit of the problem as posed by the OP - if one thinks the current history of 81/136 is a reasonable measure of "chance to win" then we must also assume as the player wins more games, his game history and his "chance to win" will increase (first to 82/137, then 83/138, etc.)

to wit, palynka is assuming that as you win more and more often, your "win chance" as well as your "current win percentage" will increase, and thus change your probability of winning the next game. i.e. the more you win, the better you must be as a player, and thus your likelihood of winning increases (though not by much if you have a substantial history of games). so he's looking for an algorithm which adapts to your streak as the streak increases... as he said i think this is most likely going to be "neat" if tackled with a recursive algorithm, and finding an explicit algebraic form for an answer will be quite ugly (and perhaps even relatively unsatisfying)

at least i think... my math brain doesn't work at 4am after a night of drinking 🙂
KazetNagorra
Germany
Joined
27 Oct '08
Moves
3118
06 Feb '10 12:32
One thing you maybe ought to consider here, is that a losing streak makes a consecutive winning streak more probable; your rating drops and as a result you probably play lower rated opponents. So the assumption that the win percentage is independent of rating (assuming there is a chess rating used, of course) is not accurate.
Palynka
Upward Spiral
Halfway
Joined
02 Aug '04
Moves
8702
08 Feb '10 10:37
Originally posted by KazetNagorra
One thing you maybe ought to consider here, is that a losing streak makes a consecutive winning streak more probable; your rating drops and as a result you probably play lower rated opponents. So the assumption that the win percentage is independent of rating (assuming there is a chess rating used, of course) is not accurate.
Boring. There are a million reasons why it would not be independent of past performance.