micro biology question

wormer

Science

15 Mar 17

wormer

Joined: 08 Sep 06
Moves: 26697

15 Mar 17

With age, somatic cells are thought to accumulate genomic scars as a resilt of the inaccurate repair to double stranded breaks by NHEJ. Estimates based on frequency of breaks in primary human fibroblasts suggest that by the age of 70 each human somatic cell carry some 2000 NHEJ-induced mutations due to inaccurate repair. If these mutations were distributed randomly around the geome, how mnay genes would you expect to be affected?
(assume 2.5% of the genome is crucial information provided by genes)

wormer

Joined: 08 Sep 06
Moves: 26697

15 Mar 17

I'm not quite sure how i'm supposed to go about answering this question. I need help forming a plan to make calculations.

twhitehead

Cape Town

Joined: 14 Apr 05
Moves: 52945

15 Mar 17

2 edits

Originally posted by wormer
If these mutations were distributed randomly around the geome, how mnay genes would you expect to be affected?
(assume 2.5% of the genome is crucial information provided by genes)

If one mutation occurs, it has a 2.5% chance of being in a 'crucial region'.
Next you need to figure out the probability of at least one of the total number of mutations being in the cruical region. I am afraid I don't know the formula, but this site might help:

http://www.mathgoodies.com/lessons/vol6/independent_events.html

An equivalent would be picking a single ball from a bag multiple times and always putting it back. If 2.5% of the balls are black, what is the probability of picking a black ball after 2000 attempts.

[edit] Actually you asked how many black balls one would expect to pick. So rather more complicated.

twhitehead

Cape Town

Joined: 14 Apr 05
Moves: 52945

15 Mar 17

Could it really be this simple?
The probability that each mutation is in a critical region is 2.5% Therefore 2.5% of the mutations will be in critical regions
Therefore the answer is 2.5% of 2000 or 4.5

wormer

Joined: 08 Sep 06
Moves: 26697

15 Mar 17

Originally posted by wormer
With age, somatic cells are thought to accumulate genomic scars as a resilt of the inaccurate repair to double stranded breaks by NHEJ. Estimates based on frequency of breaks in primary human fibroblasts suggest that by the age of 70 each human somatic cell carry some 2000 NHEJ-induced mutations due to inaccurate repair. If these mutations were distributed ra ...[text shortened]... you expect to be affected?
(assume 2.5% of the genome is crucial information provided by genes)

correction- 2% of genome- 1.5% coding and 0.5 regulatory

wormer

Joined: 08 Sep 06
Moves: 26697

15 Mar 17

Originally posted by twhitehead
Could it really be this simple?
The probability that each mutation is in a critical region is 2.5% Therefore 2.5% of the mutations will be in critical regions
Therefore the answer is 2.5% of 2000 or 4.5

these numbers make no sense

twhitehead

Cape Town

Joined: 14 Apr 05
Moves: 52945

15 Mar 17

Originally posted by wormer
these numbers make no sense

Follow the logic not the numbers. Yes, I got the numbers wrong.
So its 2% of the mutations hit 'critical regions'.
2% of 2000 = 40

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

16 Mar 17

Originally posted by wormer
With age, somatic cells are thought to accumulate genomic scars as a resilt of the inaccurate repair to double stranded breaks by NHEJ. Estimates based on frequency of breaks in primary human fibroblasts suggest that by the age of 70 each human somatic cell carry some 2000 NHEJ-induced mutations due to inaccurate repair. If these mutations were distributed ra ...[text shortened]... you expect to be affected?
(assume 2.5% of the genome is crucial information provided by genes)

Suppose the probability of a coding mutation is x (which you seem to be saying is 2%, so 0.02). There are N mutations in total (N = 2000). Then the average number of mutations is:

<n> = 0* probability of all non-coding mutations + 1 * probability of exactly 1 coding mutation + 2 * probability of exactly 2 coding mutations + ... + N * probability that all mutations are in coding DNA.

Let's look at the typical term, we need to know the probability of n coding mutations. The probability of getting n coding mutations in a row is x^n (x to the power of n). The probability of then getting (N - n) non-coding mutations is (1 - x)^(N - n). We have to take into account that we can get our n coding mutations and (N - n) non-coding mutations in any order. This is given by the binomial coefficient (which I'll write C(N, n)). So the typical term in the above polynomial is:

n * C(N, n) * x^n * (1 - x)^(N - n)

To sum this we need a new variable y = x / (1-x), and we can rewrite the typical term as:

n* C(N, n) * y^n * (1 - x)^N

So the average number of coding mutations is now:

<n> = (1 - x)^N * sum(n = 0 ... N) n * C(N, n) * y^n

We can use that d/dy y^n = n y^(n - 1), to do the sum:

<n> = y*(1 - x)^N * d/dy sum(n = 0 ... N) C(N, n) * y^n

The sum is now straightforward:

<n> = y * (1 - x)^N * d/dy (1 + y)^N = y * (1 - x)^N * [N * (1+y)^(N - 1)]

1 + y = 1/(1 - x) so that:

<n> = [x/(1 - x)] * [(1 - x)^N] * N * [1/(1 - x)]^(N - 1) = Nx = 2000 * 0.02 = 40

So twhitehead got the right answer.

The only catch is if we have to take into account the possibility that a coding mutation is in a critical gene which produces a highly conserved protein and the mutation kills the cell. Some of these mutations might kill the organism, for example if it is on the PrP gene causing CJD before age 70. So we need to factor out mutations that kill cells or the entire organism. If there are m coding bases in total of which p are critical coding bases and the genome is length l, then where x was m/l we'd need to replace it with (m - p)/(l - p). If p is small compared with m then don't worry about it.

twhitehead

Cape Town

Joined: 14 Apr 05
Moves: 52945

16 Mar 17

Originally posted by DeepThought
The only catch is if we have to take into account the possibility that a coding mutation is in a critical gene which produces a highly conserved protein and the mutation kills the cell. Some of these mutations might kill the organism, for example if it is on the PrP gene causing CJD before age 70. So we need to factor out mutations that kill cells or t ...[text shortened]... ed to replace it with (m - p)/(l - p). If p is small compared with m then don't worry about it.

A good point about evolution. Its not quite clear to me though what you are calculating.
The question states that there are 2000 mutations at age 70 - which means the cells involved (and the organism) survived to age 70, so the reality is that there were likely more mutations, some of which occurred in super critical regions but were weeded out by evolution (cell or organism death).

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

16 Mar 17

Originally posted by twhitehead
A good point about evolution. Its not quite clear to me though what you are calculating.
The question states that there are 2000 mutations at age 70 - which means the cells involved (and the organism) survived to age 70, so the reality is that there were likely more mutations, some of which occurred in super critical regions but were weeded out by evolution (cell or organism death).

I assume you mean the (m - p)/(l - p) bit. That's just a way of excluding mutations that kill the cell (or the entire organism) from the calculation. As an analogy imagine shuffling a pack of cards and turning over the top card, drawing a joker corresponds to a coding mutation, if the card turned over is the bridge score card then that ends the game. If one does this twelve times (say) then since we've specified that the bridge scoring card has not been drawn then I think that one gets the right probability if one just does the calculation for a normal pack with two jokers and with the bridge scoring card absent.

twhitehead

Cape Town

Joined: 14 Apr 05
Moves: 52945

16 Mar 17

Originally posted by DeepThought
If one does this twelve times (say) then since we've specified that the bridge scoring card has not been drawn then I think that one gets the right probability if one just does the calculation for a normal pack with two jokers and with the bridge scoring card absent.

I agree.