Sequence-Based Random Projection Ensemble Approach to Identify Hotspot Residues from Whole Protein Sequence

Chen, Peng; Hu, ShanShan; Wang, Bing; Zhang, Jun

doi:10.1007/978-3-319-22186-1_37

Peng Chen¹⁶,
ShanShan Hu¹⁶,
Bing Wang¹⁷ &
…
Jun Zhang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9226))

Included in the following conference series:

International Conference on Intelligent Computing

1513 Accesses
1 Citations

Abstract

Hot spot residues of proteins are key to performing specific functions in many biological processes. However the identification of hot spots by experimental methods is costly and time-consuming. Computational method is an alternative to identify hot spots by using sequential and structural information. However, structural information of protein is not always available. In this paper, the issue of identifying hot spots is addressed by using statistically physicochemical properties of amino acids only. Firstly, 34 relatively independent physicochemical properties are extracted from the 544 properties in AAindex1. Since the hot spots data set is extremely imbalanced, the ratio of the number of hot spots to that of non-hot spots is about 1.4 %, the hot spot set and a set of non-hot spot subset with roughly the number of that hot spots forms an initial input matrix. Random projection on the matrix achieves an input to a REPTree classifier. Several random projections and different sets of non-hot spots build an ensemble REPTree system. Experimental results showed that although our method performed worse it is a complement to the experiments on hot spot determination, on the commonly used hot spot benchmark sets.

This work was supported by the National Natural Science Foundation of China (Nos. 61300058, 61271098 and 61472282).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280((1), 1–9 (1998)
Article Google Scholar
Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196)), 383–386 (1995)
Article Google Scholar
Kortemme, T., Baker, D.: A simple physical model for binding energy hot spot in protein-protein complex. Proc. Natl. Acad. Sci. USA 99(22), 14116–141121 (2002)
Article Google Scholar
Keskin, O., Ma, B., Nussinov, R.: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)
Article Google Scholar
Thorn, K.S., Bogan, A.A.: Asedb: a database of alanine mutations and their Effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)
Article Google Scholar
Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., Russo, R., Huang, D., Paczkowski, M., Lalchandani, V., Ramachandra, C., Ellison, B., Galer, S., Shapley, J., Fuentes, E., Tsai, J.: The binding interface database (bid): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)
Article Google Scholar
Kumar, M.D.S., Gromiha, M.M.: Pint: protein-protein interactions thermodynam-Ic database. Nucleic Acids Res. 34, D195–D198 (2006)
Article Google Scholar
Moal, I.H., Fernández-Recio, J.: Skempi: A structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)
Article Google Scholar
DeLano, W.L.: unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)
Article Google Scholar
Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein–protein complexes. Proc. Natl. Acad. Sci. 99(22), 14116–14121 (2002)
Article Google Scholar
Guerois, R., Nielsen, J.E., Serrano, L.: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)
Article Google Scholar
Gao, Y., Wang, R., Lai, L.: Structure-based method for analyzing protein-protein interfaces. J. Mol. Model. 10(1), 44–54 (2004)
Article Google Scholar
Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., Serrano, L.: The foldx web server: an online Force field. Nucleic Acids Res. 33(Web Server issue), W382–W388 (2005)
Article Google Scholar
Huo, S., Massova, I., Kollman, P.A.: Computational alanine scanning of the 1:1 human growth hormone-receptor complex. J. Comput. Chem. 23(1), 15–27 (2002)
Article Google Scholar
Rajamani, D., Thiel, S., Vajda, S., Camacho, C.J.: Anchor residues in protein-Protein interactions. Proc. Natl. Acad. Sci. USA 101(31), 11287–11292 (2004)
Article Google Scholar
Gonzlez-Ruiz, D., Gohlke, H.: Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. Curr. Med. Chem. 13(22), 2607–2625 (2006)
Article Google Scholar
Ma, B., Elkayam, T., Wolfson, H., Nussinov, R.: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. USA 100(10), 5772–5777 (2003)
Article Google Scholar
del Sol, A., O’Meara, P.: Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3), 672–682 (2005)
Article Google Scholar
Brinda, K.V., Kannan, N., Vishveshwara, S.: Analysis of homodimeric protein interfaces by graph-spectral methods. Protein Eng. 15(4), 265–277 (2002)
Article Google Scholar
Guharoy, M., Chakrabarti, P.: Conservation and relative importance of residues across protein-protein interfaces. Proc. Natl. Acad. Sci. USA 102(43), 15447–15452 (2005)
Article Google Scholar
Grosdidier, S., Fernndez-Recio, J.: identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinform. 9, 447 (2008)
Article Google Scholar
Ofran, Y., Rost, B.: Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 3(7), e119 (2007)
Article Google Scholar
Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68(4), 813–823 (2007)
Article Google Scholar
Guney, E., Tuncbag, N., Keskin, O., Gursoy, A.: Hotsprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 36(Database issue), D662–D666 (2008)
Google Scholar
Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)
Article Google Scholar
Cho, K.I., Kim, D., Lee, D.: A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res. 37(8), 2672–2687 (2009)
Article Google Scholar
Lise, S., Archambeau, C., Pontil, M., Jones, D.T.: Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform. 10, 365 (2009)
Article Google Scholar
Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: Apis: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)
Article Google Scholar
Tuncbag, N., Keskin, O., Gursoy, A.: Hotpoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(Web Server issue), W402–W406 (2010)
Article Google Scholar
Lise, S., Buchan, D., Pontil, M., Jones, D.T.: Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS ONE 6(2), e16774 (2011)
Article Google Scholar
Wang, L., Liu, Z.P., Zhang, X.S., Chen, L.: Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng. Des. Sel. 25(3), 119–126 (2012)
Article Google Scholar
Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J.Z., Gao, X.: Accurate prediction of hot Spot residues through physicochemical characteristics of amino acid sequences. Proteins 81(8), 1351–1362 (2013)
Article Google Scholar
Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: Aaindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)
Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Miller, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
Chen, P., Li, J.: Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinform. 11, 402 (2010)
Article Google Scholar
Chen, P., Wong, L., Li, J.: Detection of outlier residues for improving interface prediction in protein heterocomplexes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1155–1165 (2012)
Article MathSciNet Google Scholar
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: Proceedings of the 17th ACM Symposium on the Principles of Database Systems, pp. 159–168 (1998)
Google Scholar
Kaski, S.: dimensionality reduction by random mapping: fast similarity computation for clustering. In: Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference, vol. 1, pp. 413–418 (1998)
Google Scholar
Esposito, F., Malerba, D., Semeraro, G., Tamma, V.: The Effects of pruning methods on the predictive accuracy of induced decision trees (1999)
Google Scholar
Chen, P., Huang, J.Z., Gao, X.: Ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinform. 15(Suppl 15), S4 (2014)
Article Google Scholar
Kuncheva, L.I., Whitaker, C.J., Duin, R.P.W.: Limits on the majority vote accuracy in classifier fusion. Pattern Anal. Appl. 6(1), 22–31 (2003)
Article MathSciNet MATH Google Scholar
Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580(2), 380–384 (2006)
Article Google Scholar
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Bio. 157(1), 105–132 (1982)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Health Sciences, Anhui University, Hefei, 230601, Anhui, China
Peng Chen & ShanShan Hu
School of Electronics and Information Engineering, Tongji University, Shanghai, 804201, China
Bing Wang
College of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
Jun Zhang

Authors

Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
ShanShan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Chen .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Liverpool John Moores University, Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, P., Hu, S., Wang, B., Zhang, J. (2015). Sequence-Based Random Projection Ensemble Approach to Identify Hotspot Residues from Whole Protein Sequence. In: Huang, DS., Jo, KH., Hussain, A. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9226. Springer, Cham. https://doi.org/10.1007/978-3-319-22186-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-22186-1_37
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22185-4
Online ISBN: 978-3-319-22186-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics