Issues in predicting protein function from sequence

Brief Bioinform. 2001 Mar;2(1):19-29. doi: 10.1093/bib/2.1.19.

Abstract

Identifying homologues, defined as genes that arose from a common evolutionary ancestor, is often a relatively straightforward task, thanks to recent advances made in estimating the statistical significance of sequence similarities found from database searches. The extent by which homologues possess similarities in function, however, is less amenable to statistical analysis. Consequently, predicting function by homology is a qualitative, rather than quantitative, process and requires particular care to be taken. This review focuses on the various approaches that have been developed to predict function from the scale of the atom to that of the organism. Similarities in homologues' functions differ considerably at each of these different scales and also vary for different domain families. It is argued that due attention should be paid to all available clues to function, including orthologue identification, conservation of particular residue types, and the co-occurrence of domains in proteins. Pitfalls in database searching methods arising from amino acid compositional bias and database size effects are also discussed.

Publication types

  • Review

MeSH terms

  • Amino Acids / analysis
  • Animals
  • Computational Biology*
  • Conserved Sequence
  • Databases, Factual
  • Evolution, Molecular
  • Gene Transfer, Horizontal
  • Humans
  • Protein Structure, Tertiary
  • Proteins / chemistry
  • Proteins / genetics*
  • Proteins / physiology*
  • Sequence Alignment
  • Sequence Analysis, Protein

Substances

  • Amino Acids
  • Proteins