Skip to main content

A Novel Algorithm for Prediction of Hub Proteins from Primary Structure in Eukaryotic Proteome Using Dipeptide Compositional Skew Information and Amino Acid Sequence Likeness

  • Conference paper
  • First Online:
Prospects in Bioscience: Addressing the Issues

Abstract

We propose a novel hub-finding algorithm which relies on the use of dipeptide composition and amino acid sequence likeness. For extracting the most prominent features in hub identification, two feature selection techniques are widely used in data preprocessing for machine learning problems: fast correlation-based feature selection (FCBFS) and correlation-based feature selection (CFS) algorithms. The performance of two types of classifiers such as random forest classifier (RFC) and RBF network was evaluated with these filter approaches. Our proposed model led to successful prediction of hub proteins from primary structure with 92.52 and 91.28% accuracy for RFC and RBF network, respectively, in case of FCBFS and 90.92 and 93.76% accuracy for RFC and RBF network, respectively, in case of CFS algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vallabhajosyula RR, Chakravarti D, Lutfeali S, Ray A, Raval A. Identifying hubs in protein interaction networks. PLoS One. 2009;4(4):e5344.

    Article  PubMed  Google Scholar 

  2. Albert R, Jeong H, Barabási AL. Error and attack tolerance of complex networks. Nature. 2000;406:378–82.

    Article  PubMed  CAS  Google Scholar 

  3. Tun K, Rao RK, Samavedham L, Tanaka H, Dhar PK. Rich can get poor: conversion of hub to non-hub proteins. Syst Synth Biol. 2009;2:75–82.

    Article  Google Scholar 

  4. Patil A, Kinoshita K, Nakamura H. Hub promiscuity in protein-protein interaction networks. Int J Mol Sci. 2006;11:1930–43.

    Article  Google Scholar 

  5. Aswathi BL, Nair AN, Atmaja S, Pawan KD. Identification of hub proteins from sequence. Bioinformation. 2011;7(4):163–8.

    Article  Google Scholar 

  6. He X, Zhang J. Why do hubs tend to be essential in protein networks? PLoS Genet. 2006;2:e88.

    Article  PubMed  Google Scholar 

  7. Hsing M, Byler KG, Cherkasov A. The use of Gene Ontology terms for predicting highly-connected “hub” nodes in protein-protein interaction networks. BMC Syst Biol. 2006;2:80.

    Article  Google Scholar 

  8. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–8.

    Article  PubMed  CAS  Google Scholar 

  9. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N. The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999;96:2896–901.

    Article  PubMed  CAS  Google Scholar 

  10. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999;285:751–3.

    Article  PubMed  CAS  Google Scholar 

  11. Enright J, Iliopoulos I, Kyrpides NC, Ouzounis A. Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999;402:86–90.

    Article  PubMed  CAS  Google Scholar 

  12. Ge H, Liu Z, Church GM, Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet. 2001;29:482–6.

    Article  PubMed  CAS  Google Scholar 

  13. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96:4285–8.

    Article  PubMed  CAS  Google Scholar 

  14. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, et al. IntAct –open source resource for molecular interaction data. Nucleic Acids Res. 2007;35:D561–5. http://www.ebi.ac.uk/intact/main.xhtml

    Google Scholar 

  15. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32:D115–9. http://www.uniprot.org

    Google Scholar 

  16. Weizhong Li, Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.

    Article  PubMed  Google Scholar 

  17. Ekman D, Light S, Björklund ÅK, Elofsson A. What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biol. 2006;7:R45.

    Article  PubMed  Google Scholar 

  18. Prachumwat A, Wen-Hsiung Li. Protein function, connectivity, and duplicability in yeast. Mol Biol Evol. 2006;23(1):30–9.

    Article  PubMed  CAS  Google Scholar 

  19. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004;430:88–93.

    Article  PubMed  CAS  Google Scholar 

  20. Jin G, Zhang S, Zhang XS, Chen L. Hubs with network motifs organize modularity dynamically in the protein-protein interaction network of yeast. PLoS One. 2007;2:e1207.

    Article  PubMed  Google Scholar 

  21. Batada NN, Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, et al. Stratus not altocumulus: a new view of the yeast protein interaction network. PLoS Biol. 2006;4:1720–31.

    Article  CAS  Google Scholar 

  22. Garg A, Gupta D. VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinform. 2008;9:62.

    Article  Google Scholar 

  23. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 1997;25:3389–402.

    Article  PubMed  CAS  Google Scholar 

  24. Goli B, Aswathi BL, Nair AS. A novel algorithm for prediction of protein coding DNA from non-coding DNA in microbial genomes using genomic composition and dinucleotide compositional skew, advances in computer science and engineering lecture notes of the institute for computer sciences, social infor­matics and telecommunications engineering. 2012;85:535–42

    Google Scholar 

  25. Hall M, Holmes G. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng. 2003;15:1–16.

    Article  Google Scholar 

  26. Wang C, Ding C, Meraz RF, Holbrook SR. PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics. 2006;22:2590–6.

    Article  PubMed  CAS  Google Scholar 

  27. Liu H, Yu L. Towards integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng. 2005;17(3):1–12.

    Article  Google Scholar 

  28. Huan Liu, Lei Yu. Feature selection for high-dimensional data a fast correlation-based filter solution. IEEE Trans Knowl Data Eng. 2005;17(4):491–502.

    Article  Google Scholar 

  29. Hall MA. Correlation based feature selection for machine learning. Doctoral dissertation, The University of Waikato, Department of Computer Science; 1999.

    Google Scholar 

  30. Werbos PJ. Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University; 1974.

    Google Scholar 

  31. Parker DB. Learning-logic. Technical report, TR-47, Sloan School of Management, MIT, Cambridge, MA; 1985.

    Google Scholar 

  32. Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation in Parallel distributed processing: explorations in the microstructure of cognition, vol. I. Cambridge: Bradford Books; 1986.

    Google Scholar 

  33. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32, 18.

    Google Scholar 

  34. Kira K, Rendell LA. A practical approach to feature selection. In:Proceedings of the ninth international workshop on machine learning. Morgan Kaufmann Publishers Inc; 1992. p. 249–56.

    Google Scholar 

  35. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explor. 2009;11(1).

    Google Scholar 

  36. Cherian BS, Nair AS. Protein location prediction using atomic composition and global features of the amino acid sequence. Biochem Biophys Res Commun. 2010;391:1670–4.

    Article  PubMed  CAS  Google Scholar 

  37. Namboodiri S, Verma C, Dhar PK, Giuliani A, Nair AS. Sequence signatures of allosteric proteins towards rational design. Syst Synth Biol. 2011;4:271–80.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. L. Aswathi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer India

About this paper

Cite this paper

Aswathi, B.L., Goli, B., Govindarajan, R., Nair, A.S. (2012). A Novel Algorithm for Prediction of Hub Proteins from Primary Structure in Eukaryotic Proteome Using Dipeptide Compositional Skew Information and Amino Acid Sequence Likeness. In: Sabu, A., Augustine, A. (eds) Prospects in Bioscience: Addressing the Issues. Springer, India. https://doi.org/10.1007/978-81-322-0810-5_4

Download citation

Publish with us

Policies and ethics