Secondary structure prediction
Secondary structure prediction is a set of techniques in bioinformatics that aim to predict the local secondary structures of proteins based only on knowledge of their amino acid sequences. Secondary structure prediction is used to assist in identifying appropriate templates in some homology modeling methods for protein structure prediction, as well as in classifying structural motifs and refining sequence alignments. The first methods for secondary structure prediction were introduced in the 1970's and were based on probabilistic assessments of the propensities of individual amino acids to form alpha helix, beta sheet, turns or loops, and random coil secondary structure conformations. modern methods are largely based on machine learning algorithms, particularly neural nets, trained on sample proteins of known structure from the Protein Data Bank. Specialized algorithms have been developed for the detection of specific well-defined patterns such as transmembrane helices and coiled coils.
Chou-Fasman method
The Chou-Fasman method was among the first secondary structure prediction algorithms developed and relies predominantly on probability parameters determined from relative frequencies of each amino acid's appearance in each type of secondary structure. The original Chou-Fasman parameters, determined from the small sample of structures solved in the mid-1970's, produce poor results compared to modern methods, though the parameterization has been updated since it was first published. The Chou-Fasman method is roughtly 50-60% accurate in predicting secondary structures.
GOR method
The GOR method, named for the three scientists who developed it - Garnier, Osguthorpe, and Robson - is an information theory-based method developed not long after Chou-Fasman that uses more powerful probabilistic techniques of Bayesian inference. The GOR method takes into account not only the probability of each amino acid having a particular secondary structure, but also the conditional probability of the amino acid assuming each structure given that its neighbors assume the same structure. This method is both more sensitive and more accurate due to the fact that amino acid structural propensities are only strong for a small number of amino acids such as proline and glycine. The original GOR method is roughly 65% accurate and is dramatically more successful in predicting alpha helices than beta sheets, which it frequently mispredicts as loops or disorganized regions.
Neural networks
Neural network methods use training sets of solved structures to identify common sequence motifs associated with particular arrangements of secondary structures. These methods are over 70% accurate in their predictions, although beta strands are still often underpredicted due to the lack of three-dimensional structural information that would allow assessment of hydrogen bonding patterns that can promote formation of the extended conformation required for the presence of a complete beta sheet.
References
- Mount DM (2004). Bioinformatics: Sequence and Genome Analysis, 2, Cold Spring Harbor Laboratory Press. ISBN 0879697121.