Skip to main content

Improving Phenotype Name Recognition

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6657))

Included in the following conference series:

Abstract

Due to the rapidly increasing amount of biomedical literature, automatic processing of biomedical papers is extremely important. Named Entity Recognition (NER) in this type of writing has several difficulties. In this paper we present a system to find phenotype names in biomedical literature. The system is based on Metamap and makes use of the UMLS Metathesaurus and the Human Phenotype Ontology. From an initial basic system that uses only these preexisting tools, five rules that capture stylistic and linguistic properties of this type of literature are proposed to enhance the performance of our NER tool. The tool is tested on a small corpus and the results (precision 97.6% and recall 88.3%) demonstrate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Leroy, G., Chen, H., Martinez, J.D.: A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics 36(3), 145–158 (2003)

    Article  Google Scholar 

  2. He, X., DiMarco, C.: Using lexical chaining to rank protein-protein interactions in biomedical texts. In: BioLink 2005: Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Conference of the Association for Computational Linguistics (2005) (poster Presentation)

    Google Scholar 

  3. Fundel, K., Küffner, R., Zimmer, R.: Relex - relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)

    Article  Google Scholar 

  4. Ng, S.K., Wong, M.: Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Informatics 10, 104–112 (1999)

    Google Scholar 

  5. Yu, H., Zhu, X., Huang, M., Li, M.: Discovering patterns to extract protein-protein interactions from the literature: Part ii. Bioinformatics 21(15), 3294–3300 (2005)

    Article  Google Scholar 

  6. Swanson, D.R.: Fish oil, Raynauds syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)

    Article  Google Scholar 

  7. Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Using literature-based discovery to identify disease candidate genes. I. J. Medical Informatics 74(2-4), 289–298 (2005)

    Article  Google Scholar 

  8. Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium Proceedings, pp. 349–353 (2006)

    Google Scholar 

  9. Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L.: Edgar: Extraction of drugs, genes and relations from the biomedical literature. In: Pacific Symposium on Biocomputing, vol. 5, pp. 514–525 (2000)

    Google Scholar 

  10. Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics (Oxford, England) 17(suppl. 1), S74–S82 (2001)

    Article  Google Scholar 

  11. Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27(6) (1999)

    Google Scholar 

  12. Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. In: Pacific Symposium on Biocomputing, pp. 505–516 (2000)

    Google Scholar 

  13. Gaizauskas, R., Demetriou, G., Artymiuk, P.J., Willett, P.: Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics 19(1), 135–143 (2003)

    Article  Google Scholar 

  14. Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)

    Article  Google Scholar 

  15. Valencia, A.: Automatic annotation of protein function. Current Opinion in Structural Biology 15(3), 267–274 (2005)

    Article  Google Scholar 

  16. Leser, U., Hakenberg, J.: What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics 6(4), 357–369 (2005)

    Article  Google Scholar 

  17. Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: AMIA Annual Symposium Proceedings, pp. 17–21 (2001)

    Google Scholar 

  18. Dai, M., Shah, N.H., Xuan, W., Musen, M.A., Watson, S.J., Athey, B.D., Meng, F.: An efficient solution for mapping free text to ontology terms. In: AMIA Summit on Translational Bioinformatics, San Francisco, CA (2008)

    Google Scholar 

  19. Krauthammer, M., Rzhetsky, A., Morozov, P., Friedman, C.: Using BLAST for identifying gene and protein names in journal articles. Gene 259(1-2), 245–252 (2000)

    Article  Google Scholar 

  20. Xu, R., Supekar, K., Morgan, A., Das, A., Garber, A.: Unsupervised method for automatic construction of a disease dictionary from a large free text collection. In: AMIA Annual Symposium Proceedings, pp. 820–824 (2008)

    Google Scholar 

  21. Segura-Bedmar, I., Martnez, P., Segura-Bedmarr, M.: Drug name recognition and classification in biomedical texts: A case study outlining approaches underpinning automated systems. Drug Discovery Today 13(17-18), 816–823 (2008)

    Article  Google Scholar 

  22. Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20(4), 557–568 (2004)

    Article  Google Scholar 

  23. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium Biocomputing, pp. 707–718 (1998)

    Google Scholar 

  24. Nobata, C., Collier, N., Tsujii, J.: Automatic term identification and classification in biology texts. In: The 5th NLPRS Proceeding, pp. 369–374 (1999)

    Google Scholar 

  25. Strachan, T., Read, A.: Human Molecular Genetics, 3rd edn. Garland Science/Taylor & Francis Group (2003)

    Google Scholar 

  26. Humphreys, B.L., Lindberg, D.A., Schoolman, H.M., Barnett, G.O.: The Unified Medical Language System: an informatics research collaboration. J. Am. Med. Inform. Assoc. 5(1), 1–11 (1998)

    Article  Google Scholar 

  27. Robinson, P.N., Mundlos, S.: The human phenotype ontology. Clinical Genetics 77(6), 525–534 (2010)

    Article  Google Scholar 

  28. McKusick, V.: Mendelian Inheritance in Man and Its Online Version, OMIM. The American Journal of Human Genetics 80(4), 588–604 (2007)

    Article  Google Scholar 

  29. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall, Englewood Cliffs (2008)

    Google Scholar 

  30. Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10(6), 821–855 (2003)

    Article  Google Scholar 

  31. McCray, A.T., Burgun, A., Bodenreider, O.: Aggregating UMLS Semantic Types for Reducing Conceptual Complexity. Proceedings of Medinfo. 10(pt 1), 216–220 (2001)

    Google Scholar 

  32. Day-Richter, J., Harris, M.A., Haendel, M., Obo, T.G.O., Lewis, S.: OBO-Edit an ontology editor for biologists. Bioinformatics 23(16), 2198–2200 (2007)

    Article  Google Scholar 

  33. Burgun, A., Mougin, F., Bodenreider, O.: Two approaches to integrating phenotype and clinical information. In: AMIA Annual Symposium Proceedings, pp. 75–79 (2009)

    Google Scholar 

  34. Smith, C., Goldsmith, C.A., Eppig, J.: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biology 6(1), R7+ (2004)

    Article  Google Scholar 

  35. Schwartz, A.S., Hearst, M.A.: A simple algorithm for identifying abbreviation definitions in biomedical text. In: Pacific Symposium on Biocomputing, pp. 451–462 (2003)

    Google Scholar 

  36. Chen, L., Friedman, C.: Extracting phenotypic information from the literature via natural language processing. Medinfo. 11(Pt 2), 758–762 (2004)

    Google Scholar 

  37. Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association 1(2), 161–174 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Khordad, M., Mercer, R.E., Rogan, P. (2011). Improving Phenotype Name Recognition. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21043-3_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21042-6

  • Online ISBN: 978-3-642-21043-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics