12 Jul '15 10:47>4 edits
I have been trying to understand one of the equations on this link below about the German tank problem but having problems with it:
https://en.wikipedia.org/wiki/German_tank_problem
Near the very top of that link just under “Examples”, it says
“Suppose an intelligence officer has spotted k = 4 tanks with serial numbers, 2, 6, 7, and 14, with the maximum observed serial number, m = 14. The unknown total number of tanks is called N.
...
...the Bayesian analysis below yields (primarily) a probability mass function for the number of tanks ...”
And it then gives the equation for the probability for that starting with:
“ Pr(N = n) = ...”
now, where it says the condition “if n < m “, it makes perfect sense to me because it says the probability is 0 which makes sense because you cannot have the actual total number of tanks being less than the observed number of tanks!
But, where it says the condition “if n >= m “, that is where I get confused because the equation it gives appears to me to consist of the product of two fractions with the first one being “ (k – 1)/k “. The problem I have with that is that k is supposed to be the number of observed tanks thus, if there was only one tank observed i.e. if k=1, that would mean the “(k – 1)” part of that will equal 0 and thus we have;
(k – 1)/k = 0/1 = 0
and that would mean, whatever the value of the second fraction is, the product of the two fractions is always going to be 0 (because anything multiplied by 0 is 0 ) thus, according to that equation if I read that right, the probability , for any n (when k=1 ) will always be 0, which is total nonsense! You cannot have the sum of the probabilities of an exhaustive list of mutually exclusive possibilities adding up to zero probability!
So I assume I have misread that equation? If so, how so? If not, how can it make any sense?
OK, lets just ignore that problem for a moment and just consider k values greater than 1 and look at the second fraction which I read to be the fraction consisting of the value of one binomial coefficient divided by the value of another binomial coefficient. Have I read that right?
If I read that right, each of the two binomial coefficients is expressed as a single vertical column which I cannot edit here so lets express the numerator (top half of the fraction ) one as:
C(m – 1, k – 1)
then, according the the equation shown on https://en.wikipedia.org/wiki/Binomial_coefficient , and if we let g = m – 1 and
h = k – 1, that numerator is equal to:
C(m – 1, k – 1) = C(g, h) = g!h!(g – h)!
and the denominator of that second fraction is:
C(n, k) = n!k!(n – k)!
so the whole equation for Pr(N = n) for when n >= m is:
Pr(N = n) = ( (k – 1)/k ) / ( ( g!h!(g – h)! ) / ( n!k!(n – k)! ) )
(if n >= m and we let let g = m – 1 and h = k – 1 )
Have I got that right? Because I made a java program for this and when I try and run an iteration to find the sum of all the probabilities for all possible n values with some k value for k >1, they appear to almost converge to 1 but appear to me to not quite be converging to a probability of 1 but rather appear to be converging to a probability very close to 1 but just under 1, which doesn't quite make sense. But, that said, because it doesn't appear to make sense for k=1 anyway (always giving a sum of probabilities of 0 for k=1 ) I assume that means I have somehow gone wrong before that, right?
https://en.wikipedia.org/wiki/German_tank_problem
Near the very top of that link just under “Examples”, it says
“Suppose an intelligence officer has spotted k = 4 tanks with serial numbers, 2, 6, 7, and 14, with the maximum observed serial number, m = 14. The unknown total number of tanks is called N.
...
...the Bayesian analysis below yields (primarily) a probability mass function for the number of tanks ...”
And it then gives the equation for the probability for that starting with:
“ Pr(N = n) = ...”
now, where it says the condition “if n < m “, it makes perfect sense to me because it says the probability is 0 which makes sense because you cannot have the actual total number of tanks being less than the observed number of tanks!
But, where it says the condition “if n >= m “, that is where I get confused because the equation it gives appears to me to consist of the product of two fractions with the first one being “ (k – 1)/k “. The problem I have with that is that k is supposed to be the number of observed tanks thus, if there was only one tank observed i.e. if k=1, that would mean the “(k – 1)” part of that will equal 0 and thus we have;
(k – 1)/k = 0/1 = 0
and that would mean, whatever the value of the second fraction is, the product of the two fractions is always going to be 0 (because anything multiplied by 0 is 0 ) thus, according to that equation if I read that right, the probability , for any n (when k=1 ) will always be 0, which is total nonsense! You cannot have the sum of the probabilities of an exhaustive list of mutually exclusive possibilities adding up to zero probability!
So I assume I have misread that equation? If so, how so? If not, how can it make any sense?
OK, lets just ignore that problem for a moment and just consider k values greater than 1 and look at the second fraction which I read to be the fraction consisting of the value of one binomial coefficient divided by the value of another binomial coefficient. Have I read that right?
If I read that right, each of the two binomial coefficients is expressed as a single vertical column which I cannot edit here so lets express the numerator (top half of the fraction ) one as:
C(m – 1, k – 1)
then, according the the equation shown on https://en.wikipedia.org/wiki/Binomial_coefficient , and if we let g = m – 1 and
h = k – 1, that numerator is equal to:
C(m – 1, k – 1) = C(g, h) = g!h!(g – h)!
and the denominator of that second fraction is:
C(n, k) = n!k!(n – k)!
so the whole equation for Pr(N = n) for when n >= m is:
Pr(N = n) = ( (k – 1)/k ) / ( ( g!h!(g – h)! ) / ( n!k!(n – k)! ) )
(if n >= m and we let let g = m – 1 and h = k – 1 )
Have I got that right? Because I made a java program for this and when I try and run an iteration to find the sum of all the probabilities for all possible n values with some k value for k >1, they appear to almost converge to 1 but appear to me to not quite be converging to a probability of 1 but rather appear to be converging to a probability very close to 1 but just under 1, which doesn't quite make sense. But, that said, because it doesn't appear to make sense for k=1 anyway (always giving a sum of probabilities of 0 for k=1 ) I assume that means I have somehow gone wrong before that, right?