Skip to main content

Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles

  • Conference paper
Book cover Knowledge Exploration in Life Science Informatics (KELSI 2004)

Abstract

The challenge of knowledge management in the pharmaceutical industry is twofold. First it has to address the integration of sequence data with the vast and growing body of data from functional analysis of genes with the information in huge historical archival databases. Second, as the number of biomedical publications exponentially increases (Medline now contains more than 13 million records), researchers require assistance in order to broaden their vision and comprehension of scientific domains. Analogous to data mining in the sense that it uncovers relationships in information, text mining uncovers relationships in a text collection and leverages the creativity of the knowledge worker in the exploration of these relationships and in the discovery of new knowledge. We describe herein a text mining method to automatically detect protein interactions which are described across a large amount of scientific publications. This method relies on natural language processing to identify protein names, their synonyms and the various interactions they can bear with other proteins. We have then compared text mining analysis on abstracts to the same kind of analysis on full text articles to assess how much information is lost when only abstracts are processed. Our results show that: 1)LexiQuest Mine is a very versatile and accurate tool when mining biomedical literature to analyze interactions between proteins. 2)Mining only abstracts can be sufficient and time saving for applications that do not require a high level of detail on a large scale whereas mining full text articles is to be chosen for more exhaustive applications designed to address a specific issue. Availability: LexiQuest Mine is available for commercial licensing from SPSS, Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. National Library of Medicine’s bibliographic database, at http://www.ncbi.nlm.nih.gov

  2. Fukuda, K., et al.: Toward information extraction: identifying protein names from biological papers. Pac. Symp. Biocomput., 707–718 (1998)

    Google Scholar 

  3. Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K.: A biological named entity recognizer. Pac. Symp. Biocomput., 427–438 (2003)

    Google Scholar 

  4. Krauthammer, M., et al.: Using BLAST for identifying gene and protein names in journal articles. Gene 259(1-2), 245–252 (2000)

    Article  Google Scholar 

  5. Hanisch, D., et al.: Playing biology’s name game: identifying protein names in scientific text. Pac. Symp. Biocomput., 403–414 (2003)

    Google Scholar 

  6. Egorov, S., Yuryev, A., Daraselia, N.: A simple and practical dictionary-based approach for identification of proteins in Medline abstracts. J Am. Med. Inform. Assoc. 11(3), 174–178 (2004)

    Article  Google Scholar 

  7. Hatzivassiloglou, V., Duboue, P.A., Rzhetsky, A.: Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics 17, S97–S106 (2001)

    Google Scholar 

  8. Wilbur, W.J., et al.: Analysis of biomedical text for chemical names: a comparison of three methods. In: Proc AMIA Symp, pp. 176–180 (1999)

    Google Scholar 

  9. Collier, N., Nobata, C., Tsujii, T.: Extraction of name of genes and gene products with a Hidden Markov Model. In: COLING conference proceedings (2000)

    Google Scholar 

  10. Kazama, J., et al.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: Proceedings of the Natural Language Processing in the Biomedical Domain (2002)

    Google Scholar 

  11. Ono, T., et al.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2), 155–161 (2001)

    Article  Google Scholar 

  12. Wong, L.: PIES, a protein interaction extraction system. Pac Symp Biocomput, 520–531 (2001)

    Google Scholar 

  13. Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac. Symp. Biocomput., 505–516 (2000)

    Google Scholar 

  14. Park, J.C., Kim, H.S., Kim, J.J.: Bidirectional incremental parsing for automatic path- way identification with combinatory categorial grammar. Pac. Symp. Biocomput., 396–407 (2001)

    Google Scholar 

  15. Pustejovsky, J., et al.: Robust relational parsing over biomedical literature: extracting inhibit relations. Pac. Symp. Biocomput., 362–373 (2002)

    Google Scholar 

  16. Yakushiji, A., et al.: Event extraction from biomedical papers using a full parser. Pac Symp Biocomput, 408–419 (2001)

    Google Scholar 

  17. Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. In: Genome Inform Ser Workshop Genome Inform, vol. 9, pp. 62–71 (1998)

    Google Scholar 

  18. Rindflesch, T.C., et al.: EDGAR: extraction of drugs, genes and relations from the bio- medical literature. Pac. Symp. Biocomput., 517–528 (2000)

    Google Scholar 

  19. Ng, S.K., Wong, M.: Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. In: Genome Inform Ser Workshop Genome Inform, vol. 10, pp. 104–112 (1999)

    Google Scholar 

  20. http://dip.doe-mbi.ucla.edu

  21. Corney, D.P., et al.: BioRAT: extracting biological information from full-length papers. Bioinformatics 20(17), 3206–3213 (2004)

    Article  Google Scholar 

  22. http://bioinf.cs.ucl.ac.uk/biorat/

  23. http://www.spss.com/lexiquest/lexiquest_mine.htm

  24. http://www.spss.com/lexiquest/lexiquest_categorize.htm

  25. http://www.spss.com/lexiquest/text_mining_for_clementine.htm

  26. Franzen, K., et al.: Protein names and how to find them. Int. J Med. Inf. 67(1-3), 49–61 (2002)

    Article  Google Scholar 

  27. Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)

    Article  Google Scholar 

  28. Blaschke, C., Valencia, A.: The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform Ser Workshop Genome Inform 12, 123–134 (2001)

    Google Scholar 

  29. Hu, X., et al.: Extracting and Mining Protein-Protein InteractionNetwork from Biomedical Literature. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)

    Google Scholar 

  30. Daraselia, N., et al.: Extracting human protein interactions from MEDLINE using a full- sentence parser. Bioinformatics 20(5), 604–611 (2004)

    Article  Google Scholar 

  31. Huang, M., et al.: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics (2004)

    Google Scholar 

  32. Marcotte, E.M., Xenarios, I., Eisenberg, D.: Mining literature for protein-protein interactions. Bioinformatics 17(4), 359–363 (2001)

    Article  Google Scholar 

  33. Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19(16), 2046–2053 (2003)

    Article  Google Scholar 

  34. General Architecture for Text Engineering: http://gate.ac.uk/

  35. Eisen, M.B., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)

    Article  Google Scholar 

  36. Wen, X., et al.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95(1), 334–339 (1998)

    Article  Google Scholar 

  37. Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U S A 96(6), 2907–2912 (1999)

    Article  Google Scholar 

  38. Brown, M.P., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U S A 97(1), 262–267 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martin, E.P.G., Bremer, E.G., Guerin, MC., DeSesa, C., Jouve, O. (2004). Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30478-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23927-7

  • Online ISBN: 978-3-540-30478-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics