A Novel Algorithm for Hub Protein Identification in Prokaryotic Proteome Using Di-Peptide Composition and Hydrophobicity Ratio

B.L., Aswathi; Goli, Baharak; Govindarajan, Renganayaki; Nair, Achuthsankar S.

doi:10.1007/978-3-642-32112-2_25

Aswathi B.L.⁴,
Baharak Goli⁴,
Renganayaki Govindarajan⁴ &
…
Achuthsankar S. Nair⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 305))

Included in the following conference series:

International Conference on Eco-friendly Computing and Communication Systems

1350 Accesses

Abstract

It is widely hypothesized that the information for determining protein hubness is found in their amino acid sequence patterns and features. This has moved us to relook at this problem. In this study, we propose a novel algorithm for identifying hub proteins which relies on the use of dipeptide compositional information and hydrophobicity ratio. In order to discern the most potential and protuberant features, two feature selection techniques, CFS (Correlation-based Feature Selection) and ReliefF algorithms were applied, which are widely used in data preprocessing for machine learning problems. Overall accuracy and time taken for processing the models were compared using a neural network classifier RBF Network and an ensemble classifier Bagging. Our proposed models led to successful prediction of hub proteins from amino acid sequence information with 92.94% and 92.10 % accuracy for RBF network and bagging respectively in case of CFS algorithm and 94.15 % and 90.89 % accuracy for RBF network and bagging respectively in case of ReliefF algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Albert, R., Jeong, H., Barabási, A.-L.: Error and attack tolerance of complex networks. Nature 406, 378–382 (2000)
Article Google Scholar
Latha, A.B., Nair, A.S., Sivasankaran, A., Dhar, P.K.: Identification of hub proteins from sequence. Bioinformation 7 (2011)
Google Scholar
Tun, K., Rao, R.K., Samavedham, L., Tanaka, H., Dhar, P.K.: Rich can get poor: conversion of hub to non-hub proteins. Systems and Synthetic Biology 2, 75–82 (2009)
Article Google Scholar
He, X., Zhang, J.: Why do hubs tend to be essential in protein networks? PLoS Genetics 2, e88 (2006)
Google Scholar
Patil, A., Kinoshita, K., Nakamura, H.: Hub promiscuity in protein-protein interaction networks. International Journal of Molecular Sciences 11, 1930–1943 (2006)
Article Google Scholar
Hsing, M., Byler, K.G., Cherkasov, A.: P The use of Gene Ontology terms for predicting highly-connected “hub” nodes in protein-protein interaction networks. BMC Systems Biology 2, 80 (2006)
Article Google Scholar
Srihari, S.: Detecting hubs and quasi cliques in scale-free networks. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008)
Google Scholar
Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998)
Article Google Scholar
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999)
Article Google Scholar
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999)
Article Google Scholar
Enright, J., Iliopoulos, I., Kyrpides, N.C.,, C.: Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999)
Google Scholar
Ge, H., Liu, Z., Church, G.M., Vidal, M.: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 29, 482–486 (2001)
Article Google Scholar
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)
Article Google Scholar
Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., et al.: IntAct–open source resource for molecular interaction data. Nucleic Acids Research 35, D561-D565 (2007), http://www.ebi.ac.uk/intact/main.xhtml
Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., et al.: UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32, D115–D119 (2004), http://www.uniprot.org
Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)
Article Google Scholar
Garg, A., Gupta, D.: VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics 9, 62 (2008)
Article Google Scholar
Young, L., Jernigan, B.L., Covell, D.G.: A role for surface hydrophobicity in protein-protein recognition. Protein Sci. 3, 717–729 (1994)
Article Google Scholar
Jeffrey, H.J.: Chaos game representation of gene structure. Nucleic Acids Res. 18, 2163–2170 (1990)
Article Google Scholar
http://web.expasy.org/protscale/pscale/Hphob.Doolittle.html
Goli, B., Aswathi, B.L., Nair, A.S.: A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew. In: Meghanathan, N., Chaki, N., Nagamalai, D. (eds.) CCSIT 2012, Part II. LNICST, vol. 85, pp. 535–542. Springer, Heidelberg (2012)
Chapter Google Scholar
Hall, M., Holmes, G.: Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Trans. Knowl. Data Eng. 15, 1–16 (2003)
Article Google Scholar
Wang, C., Ding, C., Meraz, R.F., Holbrook, S.R.: PSoL.: A positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22, 2590–2596 (2006)
Article Google Scholar
Liu, H., Yu, L.: Towards integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)
Article MATH Google Scholar
Hall, M.A.: Correlation based feature selection for machine learning. Doctoral dissertation, The University of Waikato, Dept. of Comp. Sci. (1999)
Google Scholar
Marko, R.S., Igor, K.: Theoretical and empirical analysis of relief and rreliefF. Machine Learning Journal 53, 23–69 (2003)
Article MATH Google Scholar
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufmann Publishers Inc. (1992)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Werbos, P.J.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University (1974)
Google Scholar
Parker, D.B.: Learning-logic. Technical report, TR-47, Sloan School of Management. MIT, Cambridge, Mass (1985)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error-propagation in Parallel distributed processing: Explorations in the Microstructure of Cognition, vol. I. Bradford Books, Cambridge (1986)
Google Scholar
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms:Bagging, boosting, and variants. Machine Learning 36(1/2), 105–139 (1999)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine learning 24(2), 123–140 (1996a)
MathSciNet MATH Google Scholar
Achuthsankar, S.N., Sreenadhan, S.P.: An improved digital _ltering technique using nucleotide frequency indicators for locating exons. Journal of the Computer Society of India 36, 60–66 (2006)
Google Scholar
Cherian, B.S., Nair, A.S.: Protein location prediction using atomic composition and global features of the amino acid sequence. Biochemical and Biophysical Research Communications 391, 1670–1674 (2010)
Article Google Scholar
Namboodiri, S., Verma, C., Dhar, P.K., Giuliani, A., Nair, A.-S.S.: Sequence signatures of allosteric proteins towards rational design. Systems and Synthetic Biology 4, 271–280 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Biology and Bioinformatics, University of Kerala, Trivandrum, 695581, India
Aswathi B.L., Baharak Goli, Renganayaki Govindarajan & Achuthsankar S. Nair

Authors

Aswathi B.L.
View author publications
You can also search for this author in PubMed Google Scholar
Baharak Goli
View author publications
You can also search for this author in PubMed Google Scholar
Renganayaki Govindarajan
View author publications
You can also search for this author in PubMed Google Scholar
Achuthsankar S. Nair
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Bristol, BS8 1UB, Bristol, UK
Jimson Mathew & Dhiraj K. Pradhan &
Intel Corporation, 211, Northeast 25 th Ave., 97124, Hillsbro, Oregon, USA
Priyadarshan Patra
Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi, Kerala, India
A. J. Kuttyamma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

B.L., A., Goli, B., Govindarajan, R., Nair, A.S. (2012). A Novel Algorithm for Hub Protein Identification in Prokaryotic Proteome Using Di-Peptide Composition and Hydrophobicity Ratio. In: Mathew, J., Patra, P., Pradhan, D.K., Kuttyamma, A.J. (eds) Eco-friendly Computing and Communication Systems. ICECCS 2012. Communications in Computer and Information Science, vol 305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32112-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-32112-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32111-5
Online ISBN: 978-3-642-32112-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics