Skip to main content

Application of Semantic Kernels to Literature-Based Gene Function Annotation

  • Conference paper
Discovery Science (DS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6926))

Included in the following conference series:

  • 1337 Accesses

Abstract

In recent years, a number of machine learning approaches to literature-based gene function annotation have been proposed. However, due to issues such as lack of labeled data, class imbalance and computational cost, they have usually been unable to surpass simpler approaches based on string-matching. In this paper, we investigate the use of semantic kernels as a way to address the task’s inherent data scarcity and we propose a simple yet effective solution to deal with class imbalance. From experiments on the TREC Genomics Track data, our approach achieves better F 1-score than two state-of-the-art approaches based on string-matching and cross-species information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumgartner, W.A., Cohen, J., Fox, K.B., Acquaah-Mensah, L.M., Hunter, G., Manual, L.: curation is not sufficient for annotation of genomic databases. Bioinformatics 23(13), i41–i48 (2007)

    Article  Google Scholar 

  2. Blaschke, C., Leon, E., Krallinger, M., Valencia, A.: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 6(suppl. 1), S16 (2005)

    Article  Google Scholar 

  3. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  4. Chiang, J.H., Yu, H.C.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)

    Article  Google Scholar 

  5. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)

    MATH  Google Scholar 

  6. Hersh, W., Bhuptiraju, R.T., Ross, L., Cohen, A.M., Kraemer, D.F.: TREC 2004 genomics track overview. In: Proceedings of the 13th Text Retrieval Conference, TREC (2004)

    Google Scholar 

  7. Hofmann, T.: Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 914–920 (1999)

    Google Scholar 

  8. Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI 1999, Stockholm (1999)

    Google Scholar 

  9. Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 408–415. ACM, New York (2008)

    Chapter  Google Scholar 

  10. Jaakkola, T., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1998)

    Google Scholar 

  11. Osuna, E.E., Freund, R., Girosi, F.: Support vector machines: Training and applications. Tech. rep., Massachusetts Institute of Technology (1997)

    Google Scholar 

  12. Ray, S., Craven, M.: Learning statistical models for annotating proteins with function information using biomedical text. BMC Bioinformatics 6(suppl. 1), S18 (2005)

    Article  Google Scholar 

  13. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)

    MathSciNet  MATH  Google Scholar 

  14. Seki, K., Kino, Y., Uehara, K.: Gene functional annotation with dynamic hierarchical classification guided by orthologs. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 425–432. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Seki, K., Mostafa, J.: Gene ontology annotation as text categorization: An empirical study. Information Processing & Management 44(5), 1754–1770 (2008)

    Article  Google Scholar 

  16. Si, L., Yu, D., Kihara, D., Fang, Y.: Combining gene sequence similarity and textual information for gene function annotation in the literature. Information Retrieval 11, 389–404 (2008)

    Article  Google Scholar 

  17. Stoica, E., Hearst, M.: Predicting gene functions from text using a cross-species approach. In: Proc. of Pacific Biocomputing Symposium, vol. 11, pp. 88–99 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blondel, M., Seki, K., Uehara, K. (2011). Application of Semantic Kernels to Literature-Based Gene Function Annotation. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24477-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24476-6

  • Online ISBN: 978-3-642-24477-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics