Recovery of mutations of different sizes from a population sample of DNA sequences under variable mutation rates across sites

Mol Biol Evol. 1999 Aug;16(8):1098-104. doi: 10.1093/oxfordjournals.molbev.a026199.

Abstract

Mutations may be classified according to their positions of occurrence in the genealogy of the sampled DNA sequences from a population. A mutation is said to be of size i if it has i descendants in the sample. Such classifications for mutations may yield detailed insights into the evolutionary history and properties of the population. Statistical methods based on such classification have been developed and shown to be efficient and powerful. However, the utility of these statistical methods critically depends on reliable and robust recovery of mutations of different sizes. We investigated the distributional changes of mutations of different sizes due to genealogy reconstruction using the unweighted pair-group method with arithmetic mean (UPGMA) and the performance of maximum-parsimony method in inferring mutations of different sizes on a given topology. Genealogy reconstruction by UPGMA was found to change the distribution of mutations of different sizes on constructed topologies. Multiple hits at some nucleotide sites made it difficult to infer mutations of different sizes with the maximum-parsimony method, even when the true topology was designated. These results suggest that while the newly developed statistical methods employing information on mutations of different sites are powerful, they also impose significant new challenges for developing methods to accurately recover mutations of different sizes from population DNA sequence data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence / genetics*
  • Computer Simulation
  • DNA Mutational Analysis / methods*
  • Models, Molecular*
  • Models, Statistical*
  • Nucleotides / genetics
  • Phylogeny

Substances

  • Nucleotides