Skip to main content

Automatic Extraction of Genomic Glossary Triggered by Query

  • Conference paper
Data Mining for Biomedical Applications (BioDM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3916))

Included in the following conference series:

  • 1019 Accesses

Abstract

In the domain of genomic research, the understanding of specific gene name is a portal to most Information Retrieval (IR) and Information Extraction (IE) systems. In this paper we present an automatic method to extract genomic glossary triggered by the initial gene name in query. LocusLink gene names and MEDLINE abstracts are employed in our system, playing the roles of query triggers and genomic corpus respectively. The evaluation of the extracted glossary is through query expansion in TREC2003 Genomics Track ad hoc retrieval task, and the experiment results yield evidence that 90.15% recall can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MEDLINE (2005), http://www.nlm.nih.gov/pubs/factsheets/medline.html

  2. GenBank (2005), http://www.ncbi.nlm.nih.gov/Genbank/

  3. Chiang, J.H., Yu, H.H.: MeKE: discovering the functions of gene products form biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)

    Article  Google Scholar 

  4. Pruitt, K.D., et al.: Introducing RefSeq and LocusLink: curated human genome resource at the NCBI. Trends Genet. 16(1), 44–47 (2000)

    Article  Google Scholar 

  5. LocusLink Home Page (2004), http://www.ncbi.nih.gov/LocusLink

  6. Pustejovsky, J., Castaño, J., Saurí, R., Rumshisky, A., Zhang, J., Luo, W.: Medstract: Creating Large-scale Information Servers for Biomedical Libraries. In: ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, PA (2002)

    Google Scholar 

  7. The Medstract Project - AcroMed 1.1 (2005), http://medstract.med.tufts.edu/acro1.1/

  8. Gilbert, D.G.: euGenes: A Eukaryote Genome Information System. Nucletic Acids Research 30(1), 145–148 (2002)

    Article  Google Scholar 

  9. Genomic Information for Eukaryotic Organisms (2005), http://eugenes.org/

  10. U.S. National Library of Medicine Medical Subject Headings (MeSH) Home Page (2005), http://www.nlm.nih.gov/mesh/meshhome.html

  11. Humphreys, L., Lindberg, D.A.B., Schoolman, H.M., Barnett, G.O.: The Unified Medical Language System: An Informatics Collaboration. Journal of the American Medical Informatics Association 1(5), 1–13 (1998)

    Article  Google Scholar 

  12. Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/

  13. Pustejovsky, J., Castaño, J., Cochran, B., Kotecki, M., Morrell, M., Rumshisky, A.: Linguistic Knowledge Extraction from Medline: Automatic Construction of an Acronym Database. Medinfo (2001)

    Google Scholar 

  14. Chang, J.T., Schütze, H., Altman, R.B.: Creating an Online Dictionary of Abbreviations from MEDLINE. The Journal of the American Medical Informatics Association 9(6), 612–620 (2002)

    Article  Google Scholar 

  15. Biomedical Abbreviation (2005), http://abbreviation.stanford.edu/

  16. Yu, H., Hatzivassiloglou, V., Rzhetsky, A., Wilbur, W.J.: Automatically identifying gene/protein terms in MEDLINE abstracts. J. Biomed. Inform. 35(5-6), 322–330 (2003)

    Article  Google Scholar 

  17. Hisamitsu, T., Niwa, Y.: Extraction of useful terms form parenthetical expression by using simple rules and statistical measures. In: Proceedings of the First Workshop on Computational Terminology, Compu Term 1998, Montreal, Ontario, August 15, 1998, pp. 36–42 (1998)

    Google Scholar 

  18. Satou, K., Yamamoto, K.: Utilizing weakly controlled vocabulary for sentence segmentation in biomedical literature. Silico Biology 5 (2004)

    Google Scholar 

  19. Kohli, J.: Genetic, Nomenclature and Gene List of the Fission Yeast, Schizosaccharomyces pombe. Curr. Genet. 11(8), 575–589 (1987)

    Article  Google Scholar 

  20. Wain, H.M., Bruford, E.A., Lovering, R.C., Lush, M.J., Wright, M.W., Povey, S.: Guidelines for Human Gene Nomenclature. Genomics 79(4), 464–470 (2002)

    Article  Google Scholar 

  21. HUGO Gene Nomenclature Committee (2005), http://www.gene.ucl.ac.uk/nomenclature/

  22. Maltais, L.J., et al.: Rules and Guidelines for mouse gene nomenclature: a condensed version. International committee on standardized genetic nomenclature for mice. Genomics 45(2), 471–476 (1997)

    Google Scholar 

  23. Antonarakis, S.E.: Recommendations for a nomenclature system for human gene mutations. Nomenclature working group. Hum. Mutat. 11(1), 1–3 (1998)

    Google Scholar 

  24. Horvitz, H.R., et al.: A Uniform Genetic Nomenclature for the Nematode Caenorhabditis Elegans. Mol. Gen. Genet. 175(2), 129–133 (1979)

    Article  Google Scholar 

  25. Baeza-Yates, R., Riberiro-Neto, B.: Modern Information Retrieval, pp. 24–138. ACM Press, New York (1999)

    Google Scholar 

  26. Hersh, W.R., Ravi, T.B.: TREC Genomics Track Overview. In: The Twelfth Text Retrieval Conference: TREC 2003. National Institute of Standards and Technology, Gaithersburg, MD (2003)

    Google Scholar 

  27. Li, J., Zhang, X., Zhang, M., Zhu, X.: THUIR at TREC 2004: Genomics Track. In: Proceedings of 13th Text Retrireval Conference (TREC 2004), Gaithersburg, USA, pp. 571–575 (November 2004)

    Google Scholar 

  28. Klavans, J., Muresan, S.: Evaluation of the DEFINDER System for Full Automatic Glossary Construction. In: Proceedings of the AMIA Symposium (2001)

    Google Scholar 

  29. Alexander, S., Yeh, L.H., Alexander, A.: Background and Overview for KDD Cup 2002 Task 1: Information Extraction from Biomedical Articles. SIGKDD Explorations 4(2), 87–89 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, J., Zhu, X. (2006). Automatic Extraction of Genomic Glossary Triggered by Query. In: Li, J., Yang, Q., Tan, AH. (eds) Data Mining for Biomedical Applications. BioDM 2006. Lecture Notes in Computer Science(), vol 3916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691730_4

Download citation

  • DOI: https://doi.org/10.1007/11691730_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33104-9

  • Online ISBN: 978-3-540-33105-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics