- 15 Mar '17 15:49With age, somatic cells are thought to accumulate genomic scars as a resilt of the inaccurate repair to double stranded breaks by NHEJ. Estimates based on frequency of breaks in primary human fibroblasts suggest that by the age of 70 each human somatic cell carry some 2000 NHEJ-induced mutations due to inaccurate repair. If these mutations were distributed randomly around the geome, how mnay genes would you expect to be affected?

(assume 2.5% of the genome is crucial information provided by genes) - 15 Mar '17 17:10 / 2 edits

If one mutation occurs, it has a 2.5% chance of being in a 'crucial region'.*Originally posted by wormer***If these mutations were distributed randomly around the geome, how mnay genes would you expect to be affected?**

(assume 2.5% of the genome is crucial information provided by genes)

Next you need to figure out the probability of at least one of the total number of mutations being in the cruical region. I am afraid I don't know the formula, but this site might help:

http://www.mathgoodies.com/lessons/vol6/independent_events.html

An equivalent would be picking a single ball from a bag multiple times and always putting it back. If 2.5% of the balls are black, what is the probability of picking a black ball after 2000 attempts.

[edit] Actually you asked how many black balls one would expect to pick. So rather more complicated. - 15 Mar '17 17:37

correction- 2% of genome- 1.5% coding and 0.5 regulatory*Originally posted by wormer***With age, somatic cells are thought to accumulate genomic scars as a resilt of the inaccurate repair to double stranded breaks by NHEJ. Estimates based on frequency of breaks in primary human fibroblasts suggest that by the age of 70 each human somatic cell carry some 2000 NHEJ-induced mutations due to inaccurate repair. If these mutations were distributed ra ...[text shortened]... you expect to be affected?**

(assume 2.5% of the genome is crucial information provided by genes) - 16 Mar '17 03:06

Suppose the probability of a coding mutation is x (which you seem to be saying is 2%, so 0.02). There are N mutations in total (N = 2000). Then the average number of mutations is:*Originally posted by wormer***With age, somatic cells are thought to accumulate genomic scars as a resilt of the inaccurate repair to double stranded breaks by NHEJ. Estimates based on frequency of breaks in primary human fibroblasts suggest that by the age of 70 each human somatic cell carry some 2000 NHEJ-induced mutations due to inaccurate repair. If these mutations were distributed ra ...[text shortened]... you expect to be affected?**

(assume 2.5% of the genome is crucial information provided by genes)

<n> = 0* probability of all non-coding mutations + 1 * probability of exactly 1 coding mutation + 2 * probability of exactly 2 coding mutations + ... + N * probability that all mutations are in coding DNA.

Let's look at the typical term, we need to know the probability of n coding mutations. The probability of getting n coding mutations in a row is x^n (x to the power of n). The probability of then getting (N - n) non-coding mutations is (1 - x)^(N - n). We have to take into account that we can get our n coding mutations and (N - n) non-coding mutations in any order. This is given by the binomial coefficient (which I'll write C(N, n)). So the typical term in the above polynomial is:

n * C(N, n) * x^n * (1 - x)^(N - n)

To sum this we need a new variable y = x / (1-x), and we can rewrite the typical term as:

n* C(N, n) * y^n * (1 - x)^N

So the average number of coding mutations is now:

<n> = (1 - x)^N * sum(n = 0 ... N) n * C(N, n) * y^n

We can use that d/dy y^n = n y^(n - 1), to do the sum:

<n> = y*(1 - x)^N * d/dy sum(n = 0 ... N) C(N, n) * y^n

The sum is now straightforward:

<n> = y * (1 - x)^N * d/dy (1 + y)^N = y * (1 - x)^N * [N * (1+y)^(N - 1)]

1 + y = 1/(1 - x) so that:

<n> = [x/(1 - x)] * [(1 - x)^N] * N * [1/(1 - x)]^(N - 1) = Nx = 2000 * 0.02 = 40

So twhitehead got the right answer.

The only catch is if we have to take into account the possibility that a coding mutation is in a critical gene which produces a highly conserved protein and the mutation kills the cell. Some of these mutations might kill the organism, for example if it is on the PrP gene causing CJD before age 70. So we need to factor out mutations that kill cells or the entire organism. If there are m coding bases in total of which p are critical coding bases and the genome is length l, then where x was m/l we'd need to replace it with (m - p)/(l - p). If p is small compared with m then don't worry about it. - 16 Mar '17 16:50

A good point about evolution. Its not quite clear to me though what you are calculating.*Originally posted by DeepThought***The only catch is if we have to take into account the possibility that a coding mutation is in a critical gene which produces a highly conserved protein and the mutation kills the cell. Some of these mutations might kill the organism, for example if it is on the PrP gene causing CJD before age 70. So we need to factor out mutations that kill cells or t ...[text shortened]... ed to replace it with (m - p)/(l - p). If p is small compared with m then don't worry about it.**

The question states that there are 2000 mutations at age 70 - which means the cells involved (and the organism) survived to age 70, so the reality is that there were likely more mutations, some of which occurred in super critical regions but were weeded out by evolution (cell or organism death). - 16 Mar '17 17:50

I assume you mean the (m - p)/(l - p) bit. That's just a way of excluding mutations that kill the cell (or the entire organism) from the calculation. As an analogy imagine shuffling a pack of cards and turning over the top card, drawing a joker corresponds to a coding mutation, if the card turned over is the bridge score card then that ends the game. If one does this twelve times (say) then since we've specified that the bridge scoring card has not been drawn then*Originally posted by twhitehead***A good point about evolution. Its not quite clear to me though what you are calculating.**

The question states that there are 2000 mutations at age 70 - which means the cells involved (and the organism) survived to age 70, so the reality is that there were likely more mutations, some of which occurred in super critical regions but were weeded out by evolution (cell or organism death).*I think*that one gets the right probability if one just does the calculation for a normal pack with two jokers and with the bridge scoring card absent. - 16 Mar '17 18:40

I agree.*Originally posted by DeepThought***If one does this twelve times (say) then since we've specified that the bridge scoring card has not been drawn then***I think*that one gets the right probability if one just does the calculation for a normal pack with two jokers and with the bridge scoring card absent.