Skip to main content

Mapping of Biomedical Text to Concepts of Lexicons, Terminologies, and Ontologies

  • Protocol
  • First Online:
Biomedical Literature Mining

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1159))

Abstract

Concept mapping is a fundamental task in biomedical text mining in which textual mentions of concepts of interest are annotated with specific entries of lexicons, terminologies, ontologies, or databases representing these concepts. Though there has been a significant amount of research, there are still a limited number of practical, publicly available tools for concept mapping of biomedical text specified by the user as an independent task. In this chapter, several tools that can automatically map biomedical text to concepts from a wide range of terminological resources are presented, followed by those that can map to more restricted sets of these resources. This presentation is intended to serve as a guide to researchers without a background in biomedical concept mapping of text for the selection of an appropriate tool based on usability, scalability, configurability, balance between precision and recall, and the desired set of terminological resources with which to annotate the text. Only with effective automatic concept-mapping tools will systems be able to scalably analyze the biomedical literature and other large sets of documents as a fundamental part of more complex text-mining tasks such as information extraction and hypothesis evaluation and generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nadeau K, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26

    Article  Google Scholar 

  2. Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ (2005) GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinform 6(Suppl I):S3

    Article  CAS  Google Scholar 

  3. Krauthammer M, Nenadic G (2004) Term identification in the biomedical literature. J Biomed Inform 37:512–526

    Article  PubMed  CAS  Google Scholar 

  4. Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenburg J, Sun C, Liu H-H, Torres R, Krauthammer M, Lau WM, Liu H, Hsu C-N, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Gen Biol 9(Suppl 2):S3

    Article  CAS  Google Scholar 

  5. Bales ME, Lussier YA, Johnson SB (2007) Topological analysis of large-scale biomedical terminology structures. J Am Med Inform Assoc 14:788–797

    Article  PubMed Central  PubMed  Google Scholar 

  6. Whetzel PL, Noy NF, Shah NH, Alexander RR, Nyulas C, Tudorache T, Musen MA (2011) BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res 39(Web Server issue):W541–W545

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  7. Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21:248–255

    Article  PubMed  CAS  Google Scholar 

  8. Hirschman L, Morgan AA, Yeh AS (2002) Rutabaga by any other name: extracting biological names. J Biomed Inform 35(4): 247–259

    Article  PubMed  CAS  Google Scholar 

  9. McCray AT, Browne AC, Bodenreider O (2002) The lexical properties of the gene ontology. Proc AMIA Annual Symp, 504–508

    Google Scholar 

  10. Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) GENIA corpus: a semantically annotated corpus for bio-text mining. Bioinformatics 19(Suppl 1):i180–i182

    Article  PubMed  Google Scholar 

  11. Pyysalo S, Ginter F, Heimonen J, Björne J, Boberg J, Järvinen J, Salakoski T (2007) BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform 8:50

    Article  CAS  Google Scholar 

  12. Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner Jr. WA, Cohen KB, Verspoor V, Blake JA, Hunter LE (2012) Concept annotation in the CRAFT corpus. BMC Bioinform 13:161

    Google Scholar 

  13. Briscoe T (1991) Lexical issues in natural language processing. In: Klein E, Veltman F (eds) Natural language and speech. Springer, Berlin

    Google Scholar 

  14. Hirst G (2009) Ontology and the Lexicon. In: Staab S, Studer S (eds) Handbook on ontologies. Springer, Berlin, pp 269–292

    Chapter  Google Scholar 

  15. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA

    Google Scholar 

  16. McCray AT, Srinavasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care, 235–239

    Google Scholar 

  17. Quochi V, Monachini M, Del Gratta R, Calzolari N (2008) A lexicon for biology and bioinformatics: the BOOTStrep experience. Proceedings international conf on language resources and evaluation (LREC) 2008, Marrakech, Morocco

    Google Scholar 

  18. Chute C (2000) Clinical classification and terminology: some history and current observations. J Am Med Informatics Assoc 7(3): 298–303

    Article  CAS  Google Scholar 

  19. Svenonius E (2003) Design of controlled vocabularies. In: Drake M (ed) Encyclopedia of library and information science. Marcel Dekker, New York, NY, pp 822–838

    Google Scholar 

  20. Ingenerf J, Pöppl S (2007) Biomedical vocabularies: the demand for differentiation. Proc Internat Conf Med Informatics (MEDINFO) 2007, Brisbane

    Google Scholar 

  21. Sioutos N, de Coronado S, Haber MW, Hartel FW, Shaiu W-L, Wright LW (2007) NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40:30–43

    Article  PubMed  CAS  Google Scholar 

  22. Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA (2013) Genenames.org: the HGNC resources in 2013. Nucl Acids Res 41(Database issue):D545–D552

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  23. The UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40(D1): D71–D75

    Article  PubMed Central  CAS  Google Scholar 

  24. Smith B (2003) Ontology. In: Floridi L (ed) Blackwell guide to the philosophy of computing and information. Blackwell, Oxford, pp 155–166

    Google Scholar 

  25. Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J Hum Comp Stud 43(5/6):907–928

    Article  Google Scholar 

  26. Bodenreider O, Stevens R (2006) Bio-ontologies: current trends and future directions. Brief Bioinform 7(3):256–274

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  27. Rubin DL, Shah NH, Noy NF (2007) Biomedical ontologies: a functional perspective. Brief Bioinform 9(1):75–90

    Article  PubMed  Google Scholar 

  28. Smith B, Ashburner M, Rosse C, Bard C, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, The OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25:1251–1255

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  29. The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29

    Article  PubMed Central  CAS  Google Scholar 

  30. Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236

    PubMed Central  PubMed  Google Scholar 

  31. Schuyler PL, Hole WT, Tuttle MS, Sherertz DD (1993) The UMLS Metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 81(2):217–222

    PubMed Central  PubMed  CAS  Google Scholar 

  32. Dai M, Shah NH, Xuan W, Musen MA, Watson SJ, Athey BD, Meng F (2008) An efficient solution for mapping free text to ontology terms. Proc AMIA Summit Translat Bioinform

    Google Scholar 

  33. Jonquet C, Shah NH, Musen MA (2009) The open biomedical annotator. Proc AMIA Summit Translat Bioinform

    Google Scholar 

  34. Tanenblatt M, Coden A, Saminsky I (2010) The ConceptMapper approach to named entity recognition. Proc 7th Internat Conf Lang Resources and Eval (LREC)

    Google Scholar 

  35. Ferrucci D, Lally A (2004) UIMA: An architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 10(3–4):327–348

    Article  Google Scholar 

  36. Schuemie MJ, Jelier R, Kors JA (2007) Peregrine: lightweight gene name normalization by dictionary lookup. Proc 2nd BioCreative Challenge Evaluation Workshop, 131–133

    Google Scholar 

  37. Browne AC, Divita G, Lu C, McCreedy L, Nace D (2003) Lexical systems; a report to the board of scientific counselors. Lister Hill National Center for Biomedical Communications Technical Report LHNCBC-TR-2003-003

    Google Scholar 

  38. Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA (2009) Comparison of concept recognizers for building the open biomedical annotator. BMC Bioinform 10 (Suppl 9):S14

    Article  Google Scholar 

  39. Stewart SA, von Maltzahn ME, Abidi SSR (2012) Comparing MetaMa to MGrep as a tool for mapping free text to formal medical lexicons. Proc 1st international workshop on knowledge extraction and consolidation from social media (KECSM)

    Google Scholar 

  40. Hripcsak G, Rothschild AS (2005) Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 12:296–298

    Article  PubMed Central  PubMed  Google Scholar 

  41. Funk C, Baumgartner Jr. W, Garcia B, Roeder C, Bada M, Cohen KB, Hunter LE, Verspoor K (2013) Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinform

    Google Scholar 

  42. Kang N, Singh B, Afzal Z, van Mulligen EM, Kors JA (2013) Using rule-based natural language processing to improve disease normalization in biomedical text. J Am Med Inform Assoc 0:1–6

    Google Scholar 

  43. Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucl Acids Res 39(Database Issue):D52–D57

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  44. Wermter J, Tomanek K, Hahn U (2009) High-performance gene name normalization with GENO. Bioinformatics 25(6):815–821

    Article  PubMed  CAS  Google Scholar 

  45. Hakenberg J, Gerner M, Haeussler M, Solt I, Plake C, Schroeder M, Gonzalez G, Nenadic G, Bergman CM (2011) The GNAT library for local and remote gene mention normalization. Bioinformatics 27(19):2769–2771

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  46. Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeria E, Sherry ST, Shumway M, Sirotkin K, Souvarov A, Starchenko G, Tatusova TA, Wagner L, Yaschenko E, Ye J (2009) Database resources of the National Center for Biotechnology Information. Nucl Acids Res 37(Database Issue):D5–D15

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  47. Gerner M, Nenadic G, Bergman CM (2010) LINNAEUS: a species name identification system for biomedical literature. BMC Bioinform 11:85

    Article  CAS  Google Scholar 

  48. Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcantara R, Darsow M, Guedj M, Ashburner M (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucl Acids Res 36(Database Issue):D344–D350

    PubMed Central  PubMed  CAS  Google Scholar 

  49. Jessop DM, Adams SE, Willighagen EL, Hawizy L, Murray-Rust P (2011) OSCAR4: a flexible architecture for chemical text-mining. J Cheminform 3:41

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  50. Weisgerber DW (1997) Chemical abstracts service chemical registry system: history, scope, and impacts. J Am Soc Inform Sci 48(4): 349–360

    Article  CAS  Google Scholar 

  51. Tomasulo P (2002) ChemIDplus: super source for chemical and drug information. Med Ref Serv Q 21(1):53–59

    Article  PubMed  Google Scholar 

  52. Li Q, Cheng T, Wang Y, Bryant SH (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15(23–24):1052–1057

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  53. Rocktäschel T, Weidlich M, Leser U (2012) ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28(12): 1633–1640

    Article  PubMed  CAS  Google Scholar 

  54. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djombou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for “Omics” research on drugs. Nucl Acids Res 39(Database Issue): D1035–D1041

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  55. Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A (2008) Text processing through Web services: calling Whatizit. Bioinformatics 24(2):296–298

    Article  PubMed  CAS  Google Scholar 

  56. Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucl Acids Res 33(Web Server Issue):W783–W786

    Article  PubMed Central  PubMed  CAS  Google Scholar 

  57. Pafilis E, Donoghue SI, Jensen LJ, Horn H, Kuhn M, Brown NP, Schneider R (2009) Reflect: augmented browsing for the life scientist. Nat Biotechnol 27:508–510

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Bada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this protocol

Cite this protocol

Bada, M. (2014). Mapping of Biomedical Text to Concepts of Lexicons, Terminologies, and Ontologies. In: Kumar, V., Tipney, H. (eds) Biomedical Literature Mining. Methods in Molecular Biology, vol 1159. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0709-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-0709-0_3

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-0708-3

  • Online ISBN: 978-1-4939-0709-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics