Abstract
Hexoses are simple sugars that play a key role in many cellular pathways, and in the regulation of development and disease mechanisms. Current protein-sugar computational models are based, at least partially, on prior biochemical findings and knowledge. They incorporate different parts of these findings in predictive black-box models. We investigate the empirical support for biochemical findings by comparing Inductive Logic Programming (ILP) induced rules to actual biochemical results. We mine the Protein Data Bank for a representative data set of hexose binding sites, non-hexose binding sites and surface grooves. We build an ILP model of hexose-binding sites and evaluate our results against several baseline machine learning classifiers. Our method achieves an accuracy similar to that of other black-box classifiers while providing insight into the discriminating process. In addition, it confirms wet-lab findings and reveals a previously unreported Trp-Glu amino acids dependency.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bagley, S.C., Altman, R.B.: Characterizing the microenvironment surrounding protein sites. Protein Science 4(4), 622–635 (1995)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)
Betts, M.J., Russell, R.B.: Amino acid properties and consequences of substitutions. In: Barnes, M.R., Gray, I.C. (eds.) Bioinformatics for Geneticists, pp. 289–316. John Wiley & Sons, West Sussex (2003)
Bobadilla, L., Nino, F., Narasimhan, G.: Predicting and characterizing metal-binding sites using Support Vector Machines. In: Proceedings of the International Conference on Bioinformatics and Applications, Fort Lauderdale, FL, pp. 307–318 (2004)
Chakrabarti, R., Klibanov, A.M., Friesner, R.A.: Computational prediction of native protein ligand-binding and enzyme active site sequences. Proceedings of the National Academy of Sciences of the United States of America 102(29), 10153–10158 (2005)
Davis, J., Burnside, E.S., de Castro Dutra, I., Page, D., Ramakrishnan, R., Santos Costa, V., Shavlik, J.: View Learning for Statistical Relational Learning: With an application to mammography. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, pp. 677–683 (2005)
Davis, J., Burnside, E.S., de Castro Dutra, I., Page, D., Santos Costa, V.: An integrated approach to learning Bayesian Networks of rules. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 84–95. Springer, Heidelberg (2005)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)
Finn, P., Muggleton, S., Page, D., Srinivasan, A.: Pharmacophore discovery using the Inductive Logic Programming system PROGOL. Machine Learning 30(2-3), 241–270 (1998)
Fox, M.A., Whitesell, J.K.: Organic Chemistry, 3rd edn. Jones & Bartlett Publishers, Boston (2004)
García-Hernández, E., Zubillaga, R.A., Chavelas-Adame, E.A., Vázquez-Contreras, E., Rojo-Domínguez, A., Costas, M.: Structural energetics of protein-carbohydrate interactions: Insights derived from the study of lysozyme binding to its natural saccharide inhibitors. Protein Science 12(1), 135–142 (2003)
Gilis, D., Massar, S., Cerf, N.J., Rooman, M.: Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biology 2(11), research0049 (2001)
Gold, N.D., Jackson, R.M.: Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships. Journal of Molecular Biology 355(5), 1112–1124 (2006)
Guex, N., Peitsch, M.C.: SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling. Electrophoresis 18(15), 2714–2723 (1997)
Kadirvelraj, R., Foley, B.L., Dyekjær, J.D., Woods, R.J.: Involvement of water in carbohydrate-protein binding: Concanavalin A revisited. Journal of the American Chemical Society 130(50), 16933–16942 (2008)
Khuri, S., Bakker, F.T., Dunwell, J.M.: Phylogeny, function and evolution of the cupins, a structurally conserved, functionally diverse superfamily of proteins. Molecular Biology and Evolution 18(4), 593–605 (2001)
Malik, A., Ahmad, S.: Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a Neural Network. BMC Structural Biology 7, 1 (2007)
Mitchell, T.M.: Machine Learning. McGraw-Hill International Editions, Singapore (1997)
Nassif, H., Al-Ali, H., Khuri, S., Keirouz, W.: Prediction of protein-glucose binding sites using Support Vector Machines. Proteins: Structure, Function, and Bioinformatics 77(1), 121–132 (2009)
Quiocho, F.A., Vyas, N.K.: Atomic interactions between proteins/enzymes and carbohydrates. In: Hecht, S.M. (ed.) Bioorganic Chemistry: Carbohydrates, ch. 11, pp. 441–457. Oxford University Press, New York (1999)
Rao, V.S.R., Lam, K., Qasba, P.K.: Architecture of the sugar binding sites in carbohydrate binding proteins—a computer modeling study. International Journal of Biological Macromolecules 23(4), 295–307 (1998)
Santos Costa, V.: The life of a logic programming system. In: de la Banda, M.G., Pontelli, E. (eds.) ICLP 2008. LNCS, vol. 5366, pp. 1–6. Springer, Heidelberg (2008)
Shionyu-Mitsuyama, C., Shirai, T., Ishida, H., Yamane, T.: An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins. Protein Engineering 16(7), 467–478 (2003)
Solomon, E., Berg, L., Martin, D.W.: Biology, 8th edn. Brooks Cole, Belmont (2007)
Srinivasan, A.: The Aleph Manual, 4th edn. (2007), http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/aleph.html
Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 273–287. Springer, Heidelberg (1997)
Sujatha, M.S., Balaji, P.V.: Identification of common structural features of binding sites in galactose-specific proteins. Proteins: Structure, Function, and Bioinformatics 55(1), 44–65 (2004)
Sujatha, M.S., Sasidhar, Y.U., Balaji, P.V.: Energetics of galactose and glucose-aromatic amino acid interactions: Implications for binding in galactose-specific proteins. Protein Science 13(9), 2502–2514 (2004)
Taroni, C., Jones, S., Thornton, J.M.: Analysis and prediction of carbohydrate binding sites. Protein Engineering 13(2), 89–98 (2000)
Wang, G., Dunbrack, R.L.: PISCES: A Protein Sequence Culling Server. Bioinformatics 19(12), 1589–1591 (2003)
Zhang, Y., Swaminathan, G.J., Deshpande, A., Boix, E., Natesh, R., Xie, Z., Acharya, K.R., Brew, K.: Roles of individual enzyme-substrate interactions by alpha-1,3-galactosyltransferase in catalysis and specificity. Biochemistry 42(46), 13512–13521 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nassif, H., Al-Ali, H., Khuri, S., Keirouz, W., Page, D. (2010). An Inductive Logic Programming Approach to Validate Hexose Binding Biochemical Knowledge. In: De Raedt, L. (eds) Inductive Logic Programming. ILP 2009. Lecture Notes in Computer Science(), vol 5989. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13840-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-13840-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13839-3
Online ISBN: 978-3-642-13840-9
eBook Packages: Computer ScienceComputer Science (R0)