Skip to main content

Text Mining on PubMed

  • Chapter
  • First Online:
Approaches in Integrative Bioinformatics

Abstract

A technology of linguistic analysis with the use of computer methods is called a text mining.

Computer tools based on this technology can provide a wide range of tasks, including:

  1. 1.

    The task of finding a relevant literature with the user-specified criteria and determination of the correspondence between single article or manually specified picks of articles and researching area of knowledge or a set of predesignated areas

  2. 2.

    The task of identification and extraction of names of biological objects that can be found in the raw text (e.g., genes, proteins, metabolites) with extra information on them, such as the type of object and names of its synonyms

  3. 3.

    The task of establishment of relationships between objects that had been automatically recognized in text with the representation of the obtained data in a form convenient for the further analysis, for example, in the form of associative networks

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shatkay H, Wilbur WJ (2000) Finding themes in medline documents: probabilistic similarity search. In: Hoppenbro J, Souza Lima T, Papazoglou M, Sheth A (eds) Proceedings IEEE advances in digital libraries 2000, Washington DC, May 2000, pp 183–192

    Chapter  Google Scholar 

  2. Joyce T, Needham RM (1997) The thesaurus approach to information retrieval. American documentation (1958) 9:192–197. In: Sparck Jones K, Willet P (eds) Readings in information retrieval. Morgan Kaufmann Publishers Inc, California (1997), pp 15–20

    Google Scholar 

  3. Salton G (1968) Automatic information organization and retrieval. McGraw Hill, New York

    Google Scholar 

  4. Sebastiani F (1999) Machine learning in automated text categorization. Technical report IEI-B4-31-1999, Istituto di Elaborazione dell’Informazione. CNR, Pisa

    Google Scholar 

  5. Кириченко КМ, Герасимов МБ (2001) Обзор методов кластеризации текстовых документов. Материалы международной конференции Диалог, т 2, Аксаково, 2001

    Google Scholar 

  6. Гаврилова ТА, Хорошевский ВФ (2000) Базы знаний интеллектуальных систем. Учебник, Питер, Санкт-Петербург, 2000

    Google Scholar 

  7. Ильин Н, Киселëв С, Танков С, Рябышкин В (2006) Технологии извлечения знаний из текста, Открытые системы, 6, 2006

    Google Scholar 

  8. Schuler G, Epstein J, Ohkawa H, Kans J (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol 266:141–162

    Google Scholar 

  9. Muller HM, Kenny EE, Sternberg PW (2004) Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol 2:309

    Article  Google Scholar 

  10. Becker K et al (2003) PubMatrix: a tool for multiplex literature mining. BMC Bioinforma 4:61

    Article  Google Scholar 

  11. Krauthammer M, Nenadic G (2004) Term identification in the biomedical literature. J Biomed Inform 37:512–526

    Article  Google Scholar 

  12. Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, Wilbur J, Hirschman L, Valencia A (2008) Evaluation of text mining systems for biology: overview of the Second BioCreative community challenge. Genome Biol 9(2):1

    Article  Google Scholar 

  13. Ananiadou S, McNaught J (eds) (2006) Text mining for biology and biomedicine. Artech House, Norwood

    Google Scholar 

  14. Collier N, Nobata C, Tsujii J (2000) Extracting the names of genes and gene products with a hidden Markov model. In: Proceedings of COLING 2000, Saarbruecken, pp 201–207

    Google Scholar 

  15. Morgan A, Yeh A, Hirschman L, Colosimo M (2003) Gene name extraction using FlyBase resources. In: Proceedings of NLP in biomedicine. ACL 2003, Sapporo, pp 1–8

    Google Scholar 

  16. Kazama J, Makino T, Ohta Y, Tsujii J (2002) Tuning support vector machines for biomedical named entity recognition. In: ACL-02 workshop on natural language processing in biomedical applications, Pennsylvania, July 2002

    Google Scholar 

  17. Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) GENIA corpus – a semantically annotated corpus for bio-textmining. Bioinformatics 19(1):180–182

    Article  Google Scholar 

  18. Cohen KB, Hunter L (2005) Natural language processing and systems biology. In: Dubitzky W, Azuaje F (eds) Artificial intelligence and systems biology. Springer, Dordrecht

    Google Scholar 

  19. Liu H, Hu ZZ, Zhang J, Wu C (2006) BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics 22:103–105

    Article  Google Scholar 

  20. Bairoch A, Apweiler R, Wu CH et al (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35:193–197

    Google Scholar 

  21. Wheeler D, Church D, Federhen S et al (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res 31:28–33

    Article  Google Scholar 

  22. Eppig JT et al (2005) The Mouse Genome Database (MGD): from genes to mice — a community resource for mouse biology. Nucleic Acids Res 33:471–475

    Article  Google Scholar 

  23. Christie KR et al (2004) Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res 32:311–314

    Article  Google Scholar 

  24. De la Cruz N et al (2005) The Rat Genome Database (RGD): developments towards a phenome database. Nucleic Acids Res 33:485–491

    Article  Google Scholar 

  25. Drysdale RA, Crosby MA (2005) FlyBase: genes and gene models. Nucleic Acids Res 33:390–395

    Article  Google Scholar 

  26. Chen N et al (2005) WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res 33:383–389

    Article  Google Scholar 

  27. Tsuruoka Y, Tsujii J (2003) Boosting precision and recall of dictionary-based protein name recognition. In: Ananiadou S, Tsujii J (eds) Proceedings of the ACL 2003 workshop on natural language processing in biomedicine, Stroudsburg, July 2003, vol 13. Association for Computational Linguistics, Stroudsburg, pp 41–48

    Chapter  Google Scholar 

  28. Ohta T, Tateishi Y, Mima H, Tsujii J (2002) Genia corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the human language technology conference, San Diego, March 2002

    Google Scholar 

  29. Hakenberg J et al (2008) Inter-species normalization of gene mentions with Gnat. Bioinformatics 24:126–132

    Article  Google Scholar 

  30. Tsuruoka Y, Tsujii J, Ananiadou S (2008) FACTA: a text search engine for finding associated biomedical concepts. Oxford J 24(21):2559–2560

    Google Scholar 

  31. Scheer M, Grote A, Chang A et al (2011) BRENDA, the enzyme information system in 2011. Nucleic Acids Res 39:670–676

    Article  Google Scholar 

  32. Blaschke C, Valencia A (2001) The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform 12:123–134

    Google Scholar 

  33. Chen H, Sharp BM (2004) Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinforma 5:147

    Article  Google Scholar 

  34. Nikitin A, Egorov S, Daraselia N, Mazo I (2003) Pathway studio – the analysis and navigation of molecular networks. Bioinformatics 19(16):2155–2157

    Article  Google Scholar 

  35. Demenkov PS, Ivanisenko TV, Kolchanov NA, Ivanisenko VA (2012) ANDVisio: a new tool for graphic visualization and analysis of literature mined associative gene networks in the ANDSystem. Silico Biol 11(3):149–161

    Google Scholar 

  36. Demenkov PS, Aman EE, Ivanisenko VA (2008) Associative network discovery (AND) – the computer system for automated reconstruction networks of associative knowledge about molecular-genetic interactions. Comput Technol 13(2):15–19

    Google Scholar 

  37. Podkolodnaya OA, Yarkova EE, Demenkov PS, Konovalova OS, Ivanisenko VA, Kolchanov NA (2011) Application of the ANDCell computer system to reconstruction and analysis of associative networks describing potential relationships between myopia and glaucoma. Russ J Genet 1(1):21–28

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Timofey V. Ivanisenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ivanisenko, T.V., Demenkov, P.S., Ivanisenko, V.A. (2014). Text Mining on PubMed. In: Chen, M., Hofestädt, R. (eds) Approaches in Integrative Bioinformatics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41281-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41281-3_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41280-6

  • Online ISBN: 978-3-642-41281-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics