by Michael Fernández, Julio Caballero, Leyden Fernández, José Abreu, Gianco Acosta

Abstract:

Euclidean distance counts derived from the protein 2D graphs were used for encoding protein structural information. A total of 35 amino acid 2D distance count (AA2DC) descriptors were calculated from the Euclidean distance matrices (EDM) derived from the 2D graphs at distances ranging from 0.05 to 1.8 units with a lag of 0.05 units. AA2DC descriptors were tested for building predictive classification model of the signs of the change of thermal unfolding Gibbs free energy change (G) of a large data set of 2048 single point mutations on 64 proteins. A support vector machine (SVM) classifier with a Radial Basis Function kernel was implemented for classifying the conformational stability of protein mutants. Temperature and pH of the G experimental measurements were also conveniently used for SVM training in addition to calculated AA2DC descriptors. The optimum SVM model correctly predicted about 72% of G signs in crossvalidation test for all the dataset and also for stable and unstable mutant separately. To the best of our knowledge, this level of accuracy for stable mutant recognition is the highest ever reported for a predictor using sequence information. Furthermore, the classifier adequately recognized unstable mutants of human prion protein and human transthyretin associated to diseases.

View PDF

Reference:

Classification of conformational stability of protein mutants from 2D graph representation of protein sequences using support vector machines (Michael Fernández, Julio Caballero, Leyden Fernández, José Abreu, Gianco Acosta), In Molecular Simulation, volume 33, 2007. (http://www.scopus.com/inward/record.url?eid=2-s2.0-35448935893&partnerID=40&md5=a1d0f8c57a68e456bb5f5669bb712d03) (cited By (since 1996) 3)

Bibtex Entry:

@Article{Fernandez2007a,
Title = {Classification of conformational stability of protein mutants from 2D graph representation of protein sequences using support vector machines},
Author = {Michael Fernández and Julio Caballero and Leyden Fernández and José Abreu and Gianco Acosta},
Journal = {Molecular Simulation},
Year = {2007},
Note = {cited By (since 1996) 3},
Number = {11},
Pages = {889-896},
Volume = {33},
Abstract = {Euclidean distance counts derived from the protein 2D graphs were used for encoding protein structural information. A total of 35 amino acid 2D distance count (AA2DC) descriptors were calculated from the Euclidean distance matrices (EDM) derived from the 2D graphs at distances ranging from 0.05 to 1.8 units with a lag of 0.05 units. AA2DC descriptors were tested for building predictive classification model of the signs of the change of thermal unfolding Gibbs free energy change (G) of a large data set of 2048 single point mutations on 64 proteins. A support vector machine (SVM) classifier with a Radial Basis Function kernel was implemented for classifying the conformational stability of protein mutants. Temperature and pH of the G experimental measurements were also conveniently used for SVM training in addition to calculated AA2DC descriptors. The optimum SVM model correctly predicted about 72% of G signs in crossvalidation test for all the dataset and also for stable and unstable mutant separately. To the best of our knowledge, this level of accuracy for stable mutant recognition is the highest ever reported for a predictor using sequence information. Furthermore, the classifier adequately recognized unstable mutants of human prion protein and human transthyretin associated to diseases.},
Affiliation = {Faculty of Agronomy, Molecular Modelling Group, University of Matanzas, 44740 Matanzas, Cuba; Centro de Bioinformática y Simulación Molecular, Universidad de Talca, 2 Norte 685, Casilla 721, Talca, Chile; Artificial Intelligence Lab., Faculty of Informatics, University of Matanzas, 44740 Matanzas, Cuba; National Bioinformatics Center, 10200 Havana, Cuba},
Author_keywords = {Graph similarity; Kernel-based methods; Point mutations; Protein stability prediction},
Comment = {http://www.scopus.com/inward/record.url?eid=2-s2.0-35448935893&partnerID=40&md5=a1d0f8c57a68e456bb5f5669bb712d03},
Document_type = {Article},
Doi = {http://dx.doi.org/10.1080/08927020701377070},
Owner = {2007_Mol_Simulat_33_889},
Source = {Scopus},
Url = {http://www.tandfonline.com/doi/abs/10.1080/08927020701377070}
}