Skip to main content

Bootstrapping a Verb Lexicon for Biomedical Information Extraction

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2009)

Abstract

The extraction of information from texts requires resources that contain both syntactic and semantic properties of lexical units. As the use of language in specialized domains, such as biology, can be very different to the general domain, there is a need for domain-specific resources to ensure that the information extracted is as accurate as possible. We are building a large-scale lexical resource for the biology domain, providing information about predicate-argument structure that has been bootstrapped from a biomedical corpus on the subject of E. Coli. The lexicon is currently focussed on verbs, and includes both automatically-extracted syntactic subcategorization frames, as well as semantic event frames that are based on annotation by domain experts. In addition, the lexicon contains manually-added explicit links between semantic and syntactic slots in corresponding frames. To our knowledge, this lexicon currently represents a unique resource within in the biomedical domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rebholz-Schuhmann, D., Pezik, P., Lee, V., Kim, J.-J., del Gratta, R., Sasaki, Y., McNaught, J., Montemagni, S., Monachini, M., Calzolari, N., Ananiadou, S.: BioLexicon: Towards a Reference Terminological Resource in the Biomedical Domain. In: Proc. of 16th Ann. Int. Conf. on Intelligent Systems for Molecular Biology (ISMB 2008), Toronto, Canada (2008)

    Google Scholar 

  2. Ruppenhofer, J., Ellsworth, M., Petruck, M., Johnson, C., Scheffczyk, J.: FrameNet II: Extended Theory and Practice (2006), http://framenet.icsi.berkeley.edu/

  3. Palmer, M., Kingsbury, P., Gildea, D.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics 31(1), 71–106 (2005)

    Article  Google Scholar 

  4. Fillmore, C.J.: Frame semantics and the nature of language. In: Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, vol. 280, pp. 20–32 (1976)

    Google Scholar 

  5. Dolbey, A., Ellsworth, M., Scheffczykx, J.: BioFrameNet: A Domain-Specific FrameNet Extension with Links to Biomedical Ontologies. In: Bodenreider, O. (ed.) Proceedings of KR-MED, pp. 87–94 (2006)

    Google Scholar 

  6. Wattarujeekrit, T., Shah, P., Collier, N.: PASBio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5(155) (2004)

    Google Scholar 

  7. Browne, A.C., Divita, G., Aronson, A.R., McCray, A.T.: UMLS Language and Vocabulary Tools. In: Proceedings of AMIA Annual Symposium, p. 798 (2003)

    Google Scholar 

  8. Tsai, R.T.H., Chou, W.C., Su, Y.S., Lin, Y.C., Sung, C.L., Dai, H.J., Yeh, I.T.H., Ku, W., Sung, T.Y., Hsu, W.L.: BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features. BMC Bioinformatics 8(325) (2006)

    Google Scholar 

  9. Miyao, Y., Ninomiya, T., Tsujii, J.: Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS, vol. 3248, pp. 684–693. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Hara, T., Miyao, Y., Tsujii, J.: Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 199–210. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Thompson, P., Cotter, P., Ananiadou, S., McNaught, J., Montemagni, S., Trabucco, A., Venturi, G.: Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)

    Google Scholar 

  12. Kipper-Schuler, K.: VerbNet: A broad-coverage, comprehensive verb lexicon. PhD. Thesis. Computer and Information Science Dept., University of Pennsylvania, Philadelphia, PA (2005)

    Google Scholar 

  13. Lenci, A., Busa, F., Ruimy, N., Gola, E., Monachini, M., Calzolari, N., Zampolli, A., et al.: SIMPLE Linguistic Specifications LE-SIMPLE (LE4-8346), Deliverable D2.1 & D2.2. ILC and University of Pisa (2000)

    Google Scholar 

  14. Montemagni, S., Trabucco, A., Venturi, G., Thompson, P., Cotter, P., Ananiadou, S., McNaught, J., Kim, J.-J., Rebholz-Schuhmann, D., Pezik, P.: Event annotation of domain corpora, BOOTStrep (FP6 – 028099), Deliverable 4.1. University of Manchester, ILC-CNR and European Bioinformatics Institute (2007)

    Google Scholar 

  15. Fillmore, C.J.: The case for case. In: Bach, E., Harms, R.T. (eds.) Universals in Linguistic Theory, pp. 1–88. Holt, Rinehart, and Winston, New York (1968)

    Google Scholar 

  16. Levin, B., Rappaport Hovav, M.: Lexical Semantics and Syntactic Structure. In: Lappin, S. (ed.) The Handbook of Contemporary Semantic Theory, pp. 487–507. Blackwell, Oxford (1996)

    Google Scholar 

  17. Cohen, K.B., Hunter, L.: A critical review of PASBio’s argument structures for biomedical verbs. BMC Bioinformatics 7(Suppl. 3), S5 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Venturi, G. et al. (2009). Bootstrapping a Verb Lexicon for Biomedical Information Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00382-0_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00381-3

  • Online ISBN: 978-3-642-00382-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics