back to top
back to top
back to top
back to top
     
Statistics  
     
  Using genetic markers, we can estimate how closely people are related to each other. We do this by counting the genetic differences (mutations) between two individuals.

The particular DNA markers we analyse are called ‘short tandem repeats’, where small sections of the DNA code are repeated several times. At any one of these short tandem repeats, the number of repeats can increase or decrease, usually one at a time. Thus 9 repeats of the code, GTCA, may suddenly be copied incorrectly within the body and change to 10 repeats.

 
 
 
     
Microsatellite mutation example
ancestor
1
2
5
6
7
8
9
1
2
5
6
7
8
9
1
2
5
6
7
8
9
10
cousin 1
cousin 2
     
  If we compare two known cousins and their Most Recent Common Ancestor (MRCA), we might see that the code found in the ancestor is also the same as in cousin 1 but slightly different in cousin 2. The diagram above shows this example.

This type of ‘mutation’ occurs randomly, although it is predicted that this occurs roughly once every 500 generations for any single marker. An enzyme in the body miscopying the DNA code and inserting or deleting repeat units causes these mutations. They are fairly infrequent. However, the more markers looked at, the higher the likelihood of observing a mutation. It is worth noting that the ‘1 in 500’ figure is an average – some markers may be slower, some faster.

Since the Y-DNA test uses 21 markers, it can be expected for a mutation to occur once every 24 generations or so.

To estimate when the MRCA lived, we have to use certain statistical methods. The model most representative of the actual biochemical process is the ‘step-wise mutation model’. A simple form of this model assumes that mutations at any particular marker are either a ‘one-step increase’, or a ‘one-step decrease’ – and that there is an equal chance of both occurring.

However, the calculations for this step-wise mutational model are complicated especially when two-step increases or decreases are possible (roughly 1 in 30-50) and so we revert to the far simpler ‘infinite alleles model’ and is based on the following rules:

 
     
 
1. Each new mutation gives rise to a new allele never seen in the
  population before.
   
2. Every time a mutation exists, it creates a new allele.
   
 
 

Simply put, when comparing two people, you only count the markers as match/no match. Thus if you have 21 markers, and 20 of them match, it doesn’t matter if the other marker is off by one or two, it is still only counted as one.

The infinite allele model closely fits the step-wise mutational model as long as the number of matching markers is high. This over-simplification will likely underestimate the time when the MRCA lived, but only a little bit when the markers match on a high number. This is OK for genealogists who are comparing two people who are thought to be fairly closely related and wouldn’t have picked up too many mutations since their MRCA.

 
     
  Using this infinite alleles model, we can estimate how long ago the MRCA lived. We essentially need to know three things:  
     
 
1. how many markers are used 21 markers.
   
2. how frequent mutations occur Once every 500 generations (on
  each marker) = 0.002%
   
3. how many mutations/mismatches are observed ? - (in this case,
  just one).
   
 
 
 
  For the diagram above, we observe 1 mutation/difference between the two cousins. Using the above 3 variables, we can use the step-wise mutation model and calculate the Time to the MRCA (often termed TMRCA and given in number of transmission events (i.e. generations)).  
     
 
For 21 STR markers
Number of mismatches
Average time to the MRCA
(in generations)
95% confidence interval
0
8.3
0.3 to 43.9
1
20.5
3.0 to 68.0
2
33.2
7.7 to 90.5
 
     
  In other words, with one mutation, the average time to the MRCA is 20.5. If each generation is roughly 25 years, then this is approximately 500 years. It is impossible to pin this down to an exact year as we are dealing with random events, so the upper and lower boundaries (given by the 95% Confidence Interval) are between 75 and 1700 years.

However, the equations used assume that the individuals are picked randomly. This isn’t the case; they are usually picked due to their presumed relatedness (e.g. they share a surname). This is an unquantifiable factor but would likely reduce the time to the MRCA much further.

Consider the case where the cousins had an exact match. This would bring the average time to the MRCA down to just 8.3 generations (just over 200 years) and the 95% confidence interval to just 0.3 and 43.9 generations.

The graph below ('No. of generations to the MRCA vs. No. of markers') compares the use of 10, 12, 21 and 25 markers in their estimation of the time to the MRCA. This shows that once you reach the bottom of the curve (i.e. after about 20 markers), an increase in the number of markers tested no longer gives you the same increase in accuracy to the MRCA.

 
     
 
No. of generations to the MRCA vs. No. of markers
 
     
 

Many people who share a surname will also share their haplotypes (i.e. have a 21/21 match). The graph below ('Matches against 21 markers') shows that , mathematically, the most likely person to have your haplotype is zero generations away - i.e. you (look at the line 21*). This if course makes perfect sense. But it also means that as you increase the number of generations, the probability of matching someone else becomes lower, which also makes sense. There is a higher chance that mutations have occured.

If you match someone at 20 out of 21 markers, you'll get a slightly different probability curve. The most likely MRCA is now not at zero generations, but further away.

Using 21 markers, it is usual for related individuals to share an exact haplotype i.e. a 21/21 match, although 20/21 and 19/21 matches should also be considered. Any more than this and the times to the MRCA are just too long for a connection to be considered - as most surnames begun much more recently.

 
     
Matches against 21 markers
     
     
     
Privacy Policy - information we maintain to best serve our customers
Glossary - genetics terms explained
DNA Heritage® © 2002-2009     email:info@dnaheritage.com
North American office: P.O. Box 1028, Richmond, TX 77406-1028 USA tel/fax: Toll free 866-7-DNA-DNA
European office: 40 Preston Road, Weymouth, Dorset, DT3 6PZ, UK tel:+44 (0) 1305 834936 fax:+44 (0) 1305 835925