Abstract
Hot spot is dominant for understanding the mechanism of protein-protein interactions and can be applied as a target to drug design. Since experimental methods are costly and time-consuming, computational methods are prevalently applied as an useful tool in hot spot prediction through sequence or structure information. Here, we propose a new sequence-based model that combines physicochemical features with relative accessible surface area of amino acid sequence. The model consists of 83 classifiers involving IBk algorithm, where instances for one classifier are encoded by corresponding property extracted from 544 properties in AAindex1 database. Then several top performance classifiers with respect to F1 score are selected to be an ensemble by majority voting technique. The model outperforms other state-of-the-art computational methods, yields a F1 score of 0.80 on BID test set.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chothia, C., Janin, J.: Principles of protein-protein recognition. Nature 256(5520), 705–708 (1975)
Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998)
Brenke, R., Kozakov, D., Chuang, G.Y., Beglov, D., Hall, D., Landon, M.R., Mattos, C., Vajda, S.: Fragment-based identification of druggable ‘hot spots’ of proteins using Fourier domain correlation techniques. Bioinformatics 25(5), 621–627 (2009)
Wells, J.A.: Systematic mutational analyses of protein-protein interfaces. Methods Enzymol. 202, 390–411 (1991)
DeLano, W.L.: Unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)
Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Nat. Acad. Sci. U.S.A. 99(22), 14116–14121 (2002)
Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)
Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68(4), 813–823 (2007)
Shingate, P., Manoharan, M., Sukhwa, A., Sowdhamini, R.: ECMIS: computational approach for the identification of hotspots at protein-protein interfaces. BMC Bioinformatics 15, 303 (2014)
Wang, L., Zhang, W., Gao, Q., Xiong, C.: Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues. IET Syst. Biol. 8(4), 184–190 (2014)
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. (Database Issue) 36, D202–205 (2008)
Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Thorn, K.S., Bogan, A.A.: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)
Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)
Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J.Z., Gao, X.: Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8), 1351–1362 (2013)
Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3), 246–255 (2001)
Liu, B., Wang, S., Wang, X.: DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep. 5, 15479 (2015)
Tang, H., Chen, W., Lin, H.: Identification of immunoglobulins using chou’s pseudo amino acid composition with feature selection technique. Mol. BioSyst. 12(4), 1269–1275 (2016)
Shen, H.B., Chou, K.C.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373(2), 386–388 (2008)
Martins, J.M., Ramos, R.M., Pimenta, A.C., Moreira, I.S.: Solvent-accessible surface area: how well can be applied to hot-spot detection? Proteins 82(3), 479–490 (2014)
Chen, R., Chen, W., Yang, S., Wu, D., Wang, Y., Tian, Y., Shi, Y.: Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 12, 311 (2011)
Petersen, B., Petersen, T.N., Andersen, P., Nielsen, M., Lundegaard, C.: A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol. 9, 51 (2009)
Darnell, S.J., LeGault, L., Mitchell, J.C.: KFC server: interactive forecasting of protein interaction hot spots. Nucleic Acids Res. (Web Server Issue). 36, W265–269 (2008)
Ofran, Y., Rost, B.: ISIS: interaction sites identified from sequence. Bioinformatics 23(2), E13–E16 (2007)
Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)
Acknowledgement
This work was supported by the National Natural Science Foundation of China (Nos. 61300058, 61472282, 61271098 and 61374181).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, S., Chen, P., Zhang, J., Wang, B. (2016). Prediction of Hot Spots Based on Physicochemical Features and Relative Accessible Surface Area of Amino Acid Sequence. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-42291-6_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42290-9
Online ISBN: 978-3-319-42291-6
eBook Packages: Computer ScienceComputer Science (R0)