Representing and reasoning about protein families using generative and discriminative methods.
This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about protein families. Given the limited expressive capacity of a particular method, a mixture of protein annotation and fold recognition experts, each implementing a different underlying representation, should provide a robust method for assigning sequences to families. These ideas are illustrated using two data-driven learning methods that make use of different prior information and employ independent, yet complementary, projections of a family: hidden Markov models (HMMs) based on a multiple sequence alignment and neural networks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biologically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosinemia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.[1]References
- Representing and reasoning about protein families using generative and discriminative methods. Mian, I.S., Dubchak, I. J. Comput. Biol. (2000) [Pubmed]
Annotations and hyperlinks in this abstract are from individual authors of WikiGenes or automatically generated by the WikiGenes Data Mining Engine. The abstract is from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.About WikiGenesOpen Access LicencePrivacy PolicyTerms of Useapsburg