Treffer: Alternative target functions for protein structure prediction with neural networks
Department of Biology, Georgia State University, Atlanta, GA 30303, United States
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS
Weitere Informationen
The prediction and modeling of protein structure is a central problem in bioinformatics. Neural networks have been used extensively to predict the secondary structure of proteins. While significant progress has been made by using multiple sequence data, the ability to predict secondary structure from a single sequence and a single prediction network has stagnated with an accuracy of about 75%. This implies that there is some limit to the accuracy of the prediction. In order to understand this behavior we asked the question of what happens as we change the target function for the prediction. Instead of predicting a derived quantity, such as whether a given chain is a helix, sheet or turn, we tested whether a more directly observed quantity such as the distance between a pair of α-carbon atoms could be predicted with reasonable accuracy. The α-carbon atom position is central to each residue in the protein and the distances between them in sequence define the backbone of protein. Knowledge of the distances between the α-carbon atoms is sufficient to determine the three dimensional structure of the protein. We have trained on distance data derived from the complete protein structure database (pdb) using a multi-layered perceptron (MLP) feedforward neural network with back propagation. It shows that the root of mean square error is 4.4 Å while the mean of actual output is 11.5 Å with orthogonal coding of protein primary sequence. Other coding schemes including BLOSUM62 coding and linear coding were tested with another two target functions of cutoff accuracy and correlation coefficient. The best correlation coefficient was achieved with BLOSUM62 coding scheme and the cutoff accuracy reached about 60%.