Improving Phenotype Name Recognition

Khordad, Maryam; Mercer, Robert E.; Rogan, Peter

doi:10.1007/978-3-642-21043-3_30

Maryam Khordad²¹,
Robert E. Mercer²¹ &
Peter Rogan^21,22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6657))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1644 Accesses
5 Citations

Abstract

Due to the rapidly increasing amount of biomedical literature, automatic processing of biomedical papers is extremely important. Named Entity Recognition (NER) in this type of writing has several difficulties. In this paper we present a system to find phenotype names in biomedical literature. The system is based on Metamap and makes use of the UMLS Metathesaurus and the Human Phenotype Ontology. From an initial basic system that uses only these preexisting tools, five rules that capture stylistic and linguistic properties of this type of literature are proposed to enhance the performance of our NER tool. The tool is tested on a small corpus and the results (precision 97.6% and recall 88.3%) demonstrate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Leroy, G., Chen, H., Martinez, J.D.: A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics 36(3), 145–158 (2003)
Article Google Scholar
He, X., DiMarco, C.: Using lexical chaining to rank protein-protein interactions in biomedical texts. In: BioLink 2005: Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Conference of the Association for Computational Linguistics (2005) (poster Presentation)
Google Scholar
Fundel, K., Küffner, R., Zimmer, R.: Relex - relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)
Article Google Scholar
Ng, S.K., Wong, M.: Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Informatics 10, 104–112 (1999)
Google Scholar
Yu, H., Zhu, X., Huang, M., Li, M.: Discovering patterns to extract protein-protein interactions from the literature: Part ii. Bioinformatics 21(15), 3294–3300 (2005)
Article Google Scholar
Swanson, D.R.: Fish oil, Raynauds syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine 30(1), 7–18 (1986)
Article Google Scholar
Hristovski, D., Peterlin, B., Mitchell, J.A., Humphrey, S.M.: Using literature-based discovery to identify disease candidate genes. I. J. Medical Informatics 74(2-4), 289–298 (2005)
Article Google Scholar
Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: AMIA Annual Symposium Proceedings, pp. 349–353 (2006)
Google Scholar
Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L.: Edgar: Extraction of drugs, genes and relations from the biomedical literature. In: Pacific Symposium on Biocomputing, vol. 5, pp. 514–525 (2000)
Google Scholar
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics (Oxford, England) 17(suppl. 1), S74–S82 (2001)
Article Google Scholar
Tanabe, L., Scherf, U., Smith, L.H., Lee, J.K., Hunter, L., Weinstein, J.N.: MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques 27(6) (1999)
Google Scholar
Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. In: Pacific Symposium on Biocomputing, pp. 505–516 (2000)
Google Scholar
Gaizauskas, R., Demetriou, G., Artymiuk, P.J., Willett, P.: Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics 19(1), 135–143 (2003)
Article Google Scholar
Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)
Article Google Scholar
Valencia, A.: Automatic annotation of protein function. Current Opinion in Structural Biology 15(3), 267–274 (2005)
Article Google Scholar
Leser, U., Hakenberg, J.: What makes a gene name? named entity recognition in the biomedical literature. Briefings in Bioinformatics 6(4), 357–369 (2005)
Article Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program. In: AMIA Annual Symposium Proceedings, pp. 17–21 (2001)
Google Scholar
Dai, M., Shah, N.H., Xuan, W., Musen, M.A., Watson, S.J., Athey, B.D., Meng, F.: An efficient solution for mapping free text to ontology terms. In: AMIA Summit on Translational Bioinformatics, San Francisco, CA (2008)
Google Scholar
Krauthammer, M., Rzhetsky, A., Morozov, P., Friedman, C.: Using BLAST for identifying gene and protein names in journal articles. Gene 259(1-2), 245–252 (2000)
Article Google Scholar
Xu, R., Supekar, K., Morgan, A., Das, A., Garber, A.: Unsupervised method for automatic construction of a disease dictionary from a large free text collection. In: AMIA Annual Symposium Proceedings, pp. 820–824 (2008)
Google Scholar
Segura-Bedmar, I., Martnez, P., Segura-Bedmarr, M.: Drug name recognition and classification in biomedical texts: A case study outlining approaches underpinning automated systems. Drug Discovery Today 13(17-18), 816–823 (2008)
Article Google Scholar
Horn, F., Lau, A.L., Cohen, F.E.: Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics 20(4), 557–568 (2004)
Article Google Scholar
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium Biocomputing, pp. 707–718 (1998)
Google Scholar
Nobata, C., Collier, N., Tsujii, J.: Automatic term identification and classification in biology texts. In: The 5th NLPRS Proceeding, pp. 369–374 (1999)
Google Scholar
Strachan, T., Read, A.: Human Molecular Genetics, 3rd edn. Garland Science/Taylor & Francis Group (2003)
Google Scholar
Humphreys, B.L., Lindberg, D.A., Schoolman, H.M., Barnett, G.O.: The Unified Medical Language System: an informatics research collaboration. J. Am. Med. Inform. Assoc. 5(1), 1–11 (1998)
Article Google Scholar
Robinson, P.N., Mundlos, S.: The human phenotype ontology. Clinical Genetics 77(6), 525–534 (2010)
Article Google Scholar
McKusick, V.: Mendelian Inheritance in Man and Its Online Version, OMIM. The American Journal of Human Genetics 80(4), 588–604 (2007)
Article Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall, Englewood Cliffs (2008)
Google Scholar
Shatkay, H., Feldman, R.: Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol. 10(6), 821–855 (2003)
Article Google Scholar
McCray, A.T., Burgun, A., Bodenreider, O.: Aggregating UMLS Semantic Types for Reducing Conceptual Complexity. Proceedings of Medinfo. 10(pt 1), 216–220 (2001)
Google Scholar
Day-Richter, J., Harris, M.A., Haendel, M., Obo, T.G.O., Lewis, S.: OBO-Edit an ontology editor for biologists. Bioinformatics 23(16), 2198–2200 (2007)
Article Google Scholar
Burgun, A., Mougin, F., Bodenreider, O.: Two approaches to integrating phenotype and clinical information. In: AMIA Annual Symposium Proceedings, pp. 75–79 (2009)
Google Scholar
Smith, C., Goldsmith, C.A., Eppig, J.: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biology 6(1), R7+ (2004)
Article Google Scholar
Schwartz, A.S., Hearst, M.A.: A simple algorithm for identifying abbreviation definitions in biomedical text. In: Pacific Symposium on Biocomputing, pp. 451–462 (2003)
Google Scholar
Chen, L., Friedman, C.: Extracting phenotypic information from the literature via natural language processing. Medinfo. 11(Pt 2), 758–762 (2004)
Google Scholar
Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association 1(2), 161–174 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Western Ontario, London, ON, Canada
Maryam Khordad, Robert E. Mercer & Peter Rogan
Department of Biochemistry, The University of Western Ontario, London, ON, Canada
Peter Rogan

Authors

Maryam Khordad
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Mercer
View author publications
You can also search for this author in PubMed Google Scholar
Peter Rogan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, 3737 Wascana Parkway, Regina, S4S 0A2, Saskatchewan, Canada
Cory Butz
Department of Mathematics and Computing Science, Saint Mary’s University, B3H 3C3, Halifax, Nova Scotia, Canada
Pawan Lingras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khordad, M., Mercer, R.E., Rogan, P. (2011). Improving Phenotype Name Recognition. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-21043-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21042-6
Online ISBN: 978-3-642-21043-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics