A Novel Algorithm for Prediction of Hub Proteins from Primary Structure in Eukaryotic Proteome Using Dipeptide Compositional Skew Information and Amino Acid Sequence Likeness

Aswathi, B. L.; Goli, Baharak; Govindarajan, Renganayaki; Nair, Achuthsankar S.

doi:10.1007/978-81-322-0810-5_4

B. L. Aswathi³,
Baharak Goli³,
Renganayaki Govindarajan³ &
…
Achuthsankar S. Nair³

1197 Accesses

Abstract

We propose a novel hub-finding algorithm which relies on the use of dipeptide composition and amino acid sequence likeness. For extracting the most prominent features in hub identification, two feature selection techniques are widely used in data preprocessing for machine learning problems: fast correlation-based feature selection (FCBFS) and correlation-based feature selection (CFS) algorithms. The performance of two types of classifiers such as random forest classifier (RFC) and RBF network was evaluated with these filter approaches. Our proposed model led to successful prediction of hub proteins from primary structure with 92.52 and 91.28% accuracy for RFC and RBF network, respectively, in case of FCBFS and 90.92 and 93.76% accuracy for RFC and RBF network, respectively, in case of CFS algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vallabhajosyula RR, Chakravarti D, Lutfeali S, Ray A, Raval A. Identifying hubs in protein interaction networks. PLoS One. 2009;4(4):e5344.
Article PubMed Google Scholar
Albert R, Jeong H, Barabási AL. Error and attack tolerance of complex networks. Nature. 2000;406:378–82.
Article PubMed CAS Google Scholar
Tun K, Rao RK, Samavedham L, Tanaka H, Dhar PK. Rich can get poor: conversion of hub to non-hub proteins. Syst Synth Biol. 2009;2:75–82.
Article Google Scholar
Patil A, Kinoshita K, Nakamura H. Hub promiscuity in protein-protein interaction networks. Int J Mol Sci. 2006;11:1930–43.
Article Google Scholar
Aswathi BL, Nair AN, Atmaja S, Pawan KD. Identification of hub proteins from sequence. Bioinformation. 2011;7(4):163–8.
Article Google Scholar
He X, Zhang J. Why do hubs tend to be essential in protein networks? PLoS Genet. 2006;2:e88.
Article PubMed Google Scholar
Hsing M, Byler KG, Cherkasov A. The use of Gene Ontology terms for predicting highly-connected “hub” nodes in protein-protein interaction networks. BMC Syst Biol. 2006;2:80.
Article Google Scholar
Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–8.
Article PubMed CAS Google Scholar
Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999;96:2896–901.
Article PubMed CAS Google Scholar
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–3.
Article PubMed CAS Google Scholar
Enright J, Iliopoulos I, Kyrpides NC, Ouzounis A. Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999;402:86–90.
Article PubMed CAS Google Scholar
Ge H, Liu Z, Church GM, Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet. 2001;29:482–6.
Article PubMed CAS Google Scholar
Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96:4285–8.
Article PubMed CAS Google Scholar
Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, et al. IntAct –open source resource for molecular interaction data. Nucleic Acids Res. 2007;35:D561–5. http://www.ebi.ac.uk/intact/main.xhtml
Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32:D115–9. http://www.uniprot.org
Google Scholar
Weizhong Li, Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.
Article PubMed Google Scholar
Ekman D, Light S, Björklund ÅK, Elofsson A. What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7:R45.
Article PubMed Google Scholar
Prachumwat A, Wen-Hsiung Li. Protein function, connectivity, and duplicability in yeast. Mol Biol Evol. 2006;23(1):30–9.
Article PubMed CAS Google Scholar
Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004;430:88–93.
Article PubMed CAS Google Scholar
Jin G, Zhang S, Zhang XS, Chen L. Hubs with network motifs organize modularity dynamically in the protein-protein interaction network of yeast. PLoS One. 2007;2:e1207.
Article PubMed Google Scholar
Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. Stratus not altocumulus: a new view of the yeast protein interaction network. PLoS Biol. 2006;4:1720–31.
Article CAS Google Scholar
Garg A, Gupta D. VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinform. 2008;9:62.
Article Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 1997;25:3389–402.
Article PubMed CAS Google Scholar
Goli B, Aswathi BL, Nair AS. A novel algorithm for prediction of protein coding DNA from non-coding DNA in microbial genomes using genomic composition and dinucleotide compositional skew, advances in computer science and engineering lecture notes of the institute for computer sciences, social informatics and telecommunications engineering. 2012;85:535–42
Google Scholar
Hall M, Holmes G. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng. 2003;15:1–16.
Article Google Scholar
Wang C, Ding C, Meraz RF, Holbrook SR. PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics. 2006;22:2590–6.
Article PubMed CAS Google Scholar
Liu H, Yu L. Towards integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng. 2005;17(3):1–12.
Article Google Scholar
Huan Liu, Lei Yu. Feature selection for high-dimensional data a fast correlation-based filter solution. IEEE Trans Knowl Data Eng. 2005;17(4):491–502.
Article Google Scholar
Hall MA. Correlation based feature selection for machine learning. Doctoral dissertation, The University of Waikato, Department of Computer Science; 1999.
Google Scholar
Werbos PJ. Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University; 1974.
Google Scholar
Parker DB. Learning-logic. Technical report, TR-47, Sloan School of Management, MIT, Cambridge, MA; 1985.
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation in Parallel distributed processing: explorations in the microstructure of cognition, vol. I. Cambridge: Bradford Books; 1986.
Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32, 18.
Google Scholar
Kira K, Rendell LA. A practical approach to feature selection. In:Proceedings of the ninth international workshop on machine learning. Morgan Kaufmann Publishers Inc; 1992. p. 249–56.
Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explor. 2009;11(1).
Google Scholar
Cherian BS, Nair AS. Protein location prediction using atomic composition and global features of the amino acid sequence. Biochem Biophys Res Commun. 2010;391:1670–4.
Article PubMed CAS Google Scholar
Namboodiri S, Verma C, Dhar PK, Giuliani A, Nair AS. Sequence signatures of allosteric proteins towards rational design. Syst Synth Biol. 2011;4:271–80.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram, 695581, India
B. L. Aswathi, Baharak Goli, Renganayaki Govindarajan & Achuthsankar S. Nair

Authors

B. L. Aswathi
View author publications
You can also search for this author in PubMed Google Scholar
Baharak Goli
View author publications
You can also search for this author in PubMed Google Scholar
Renganayaki Govindarajan
View author publications
You can also search for this author in PubMed Google Scholar
Achuthsankar S. Nair
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. L. Aswathi .

Editor information

Editors and Affiliations

, Dept of Biotechnology & Microbiology, Kannur University, Palayad P.O,Thalassery Campus, Kannur, 670661, Kerala, India
Abdulhameed Sabu
, Dept. of Biotechnology and Microbiology, Kannur University, Palayad P.O, Thalassery Campus, Kannur, 670661, Kerala, India
Anu Augustine

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aswathi, B.L., Goli, B., Govindarajan, R., Nair, A.S. (2012). A Novel Algorithm for Prediction of Hub Proteins from Primary Structure in Eukaryotic Proteome Using Dipeptide Compositional Skew Information and Amino Acid Sequence Likeness. In: Sabu, A., Augustine, A. (eds) Prospects in Bioscience: Addressing the Issues. Springer, India. https://doi.org/10.1007/978-81-322-0810-5_4

Download citation

DOI: https://doi.org/10.1007/978-81-322-0810-5_4
Published: 12 December 2012
Publisher Name: Springer, India
Print ISBN: 978-81-322-0809-9
Online ISBN: 978-81-322-0810-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics