Skip to main content

A Multi-Instance Multi-Label Learning Approach for Protein Domain Annotation

  • Conference paper
Intelligent Computing in Bioinformatics (ICIC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8590))

Included in the following conference series:

  • 3450 Accesses

Abstract

Domains act as structural and functional units of proteins, playing an essential role in functional genomics. To investigate the annotation of finite protein domains is of much importance because the functions of a protein can be directly inferred if the functions of its component domains are determined. In this paper, we propose PDAMIML based on a novel multi-instance multi-label learning framework combined with auto-cross covariance transformation and SVM. It can effectively annotate functions for protein domains. We evaluate the performance of PDAMIML using a benchmark of 100 protein domains and 10 high-cycle functional labels. The experiment results reveal that PDAMIML yields significant performance gains when compared to the state-of-the-art ap-proaches. Furthermore, we combine PDAMIML with the other two existing methods by using majority voting, and obtain encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apic, G., Gough, J., Teichmann, S.A.: Domain Combinations in Archaeal, Eubacterial and Eukaryotic Proteomes. Journal of Molecular Biology 310, 311–325 (2001)

    Article  Google Scholar 

  2. Wang, M.L., Caetano, A.G.: Global Phylogeny Determined by The Combination of Protein Daomains in Proteomes. Mol. Boi. Evol. 23(12), 2444–2454 (2006)

    Article  Google Scholar 

  3. Bork, P.: Shuffled Domains in Extracellular Proteins. FEBS Letters 286(1-2), 47–54 (1991)

    Article  Google Scholar 

  4. Schug, J., Diskin, S., Mazzarelli, J., et al.: Predicting Gene Ontology Functions From Prodom and CDD Protein Domains. Genome Res. 12(4), 648–655 (2002)

    Article  Google Scholar 

  5. Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: Tool For The Unification of Biology. The Gene Ontology Consortium. Nat Genet. 25, 25–29 (2000)

    Google Scholar 

  6. Lu, X., Zhai, C., Gopalakrishnan, V., Buchanan, B.G.: Automatic Annotation of Protein Motif Function With Gene Ontology Terms. BMC Bioinformatics 5, 122 (2004)

    Article  Google Scholar 

  7. Zhao, X.M., Wang, Y., Chen, L., Aihara, K.: Protein Domain Annotation With Integration of Heterogeneous Information Sources. Proteins 72, 461–473 (2008)

    Article  Google Scholar 

  8. Zhou, Z.H., Zhang, M.L., Huang, S.J., Li, Y.F.: Multi-Instance Multi-Label Learning. Artificial Intelligence 176(1), 2291–2320 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  9. Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)

    MATH  Google Scholar 

  10. Wold, S., Jonsson, J., Sjöström, M., et al.: Dna and Peptide Sequences and Chemical Processes Mutlivariately Modelled by Principal Component Analysis and Partial Least-Squares Projections To Latent Structures. Anal. Chim. Acta. 277(2), 239–253 (1993)

    Article  Google Scholar 

  11. Altschul, S.F., Madden, T.L., et al.: Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  12. Hunter, S., Jones, P., Mitchell, A.: Interpro in 2011: New Developments in The Family and Domain Prediction Database. Nucleic Acids Research 40, 306–312 (2011)

    Article  Google Scholar 

  13. Camon, E., Magrane, M., Barrell, D., Lee, V., et al.: The Gene Ontology Annotation (GOA) Database:Sharing Knowledge in Uniprot With Gene Ontology. Nucleic Acids Research 32, 262–266 (2004)

    Article  Google Scholar 

  14. Heringa, J., Domains, P.: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. Wiley Interscience (2005)

    Google Scholar 

  15. Steinwart, I., Hush, D., Scovel, C.: An Explicit Description of The Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels. IEEE Transactions on Information Theory 52, 4635–4643 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  16. Deng, L., Guan, J., Dong, Q., et al.: Semihs: An Iterative Semi-Supervised Approach For Predicting Protein-Protein Interaction Hot Spots. Protein Pept. Lett. 18(9), 896–905 (2011)

    Article  Google Scholar 

  17. Deng, L., Guan, J., Wei, X., et al.: Boosting Prediction Performance of Protein-Protein Interaction Hot Spots by Using Structural Neighborhood Properties. Journal of Computational Biology 20(11), 878–891 (2013)

    Article  MathSciNet  Google Scholar 

  18. Wen, Z.N., Li, M.L., Li, Y.Z., Guo, Y.Z., Wang, K.L.: Delaunay Triangulation With Partial Least Squares Projection To Latent Structures: A Model For G-Protein Coupled Receptors Classification and Fast Structure Recognition. Amino Acids 32, 277–283 (2007)

    Article  Google Scholar 

  19. Guo, Y., Yu, L., Wen, Z., Li, M.: Using Support Vector Machine Combined With Auto Co-Variance To Predict Protein-Protein Interactions From Protein Sequences. Nucleic Acids Research 36(9), 3025–3030 (2008)

    Article  Google Scholar 

  20. Deng, L., Guan, J., Dong, Q., et al.: Prediction of Protein-Protein Interaction Sites Using An Ensemble Method. BMC Bioinformatics 10, 426 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Meng, Y. et al. (2014). A Multi-Instance Multi-Label Learning Approach for Protein Domain Annotation. In: Huang, DS., Han, K., Gromiha, M. (eds) Intelligent Computing in Bioinformatics. ICIC 2014. Lecture Notes in Computer Science(), vol 8590. Springer, Cham. https://doi.org/10.1007/978-3-319-09330-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09330-7_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09329-1

  • Online ISBN: 978-3-319-09330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics