Supervised Machine Learning Approach for Bio-molecular Event Extraction

  • Asif Ekbal
  • Amit Majumder
  • Mohammad Hasanuzzaman
  • Sriparna Saha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7077)


The main goal of biomedical text mining is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities such as proteins and genes. Most of the research in the related areas were focused on extracting only binary relations. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose a supervised approach that enables extraction, i.e. identification and classification of relatively complex bio-molecular events. We approach this as the supervised machine learning problems and use the well-known statistical algorithm, namely Conditional Random Field (CRF) that makes use of statistical and linguistic features that represent various morphological, syntactic and contextual information of the candidate bio-molecular trigger words. Firstly, we consider the problem of event identification and classification as a two-step process, first step of which deals with the event identification task and the second step classifies these identified events to one of the nine predefined classes. Thereafter, we perform event identification and classification together. Three-fold cross validation experiments on the Biomedical Natural Language Processing (BioNLP) 2009 shared task datasets yield the overall average recall, precision and F-measure values of 58.88%, 74.53% and 65.79%, respectively, for the event identification. We observed the overall classification accuracy of 59.34%. Evaluation results of the proposed approach when identification and classification are performed together showed the overall recall, precision and F-measure values of 59.92%, 54.25% and 56.94%, respectively.


Event Trigger Conditional Random Field Shared Task Event Extraction Name Entity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Nedellec, C.: Learning Language in Logic -Genic Interaction Extraction Challenge. In: Cussens, J., Nedellec, C. (eds.) Proceedings of the 4th Learning Language in Logic Workshop (LLL 2005), pp. 31–37 (2005)Google Scholar
  2. 2.
    Hirschman, L., Krallinger, M., Valencia, A. (eds.): Proceedings of the Second BioCreative Challenge Evaluation Workshop. CNIO Centro Nacional de Investigaciones Oncologicas (2007)Google Scholar
  3. 3.
    Chatr-aryamontri, A., Ceol, A., Palazzi, L.M., Nardelli, G., Schneider, M.V., Castagnoli, L., Cesareni, G.: MINT: the Molecular INTeraction database. Nucleic Acids Research 35(suppl. 1), D572–D574 (2007)CrossRefGoogle Scholar
  4. 4.
    Kim, J.-D., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP 2009 shared task on event extraction. In: BioNLP 2009: Proceedings of the Workshop on BioNLP, pp. 1–9 (2009)Google Scholar
  5. 5.
    Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jarvinen, J., Salakoski, T.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8, 50 (2007)CrossRefGoogle Scholar
  6. 6.
    Kim, J.-D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9, 10 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Asif Ekbal
    • 1
  • Amit Majumder
    • 2
  • Mohammad Hasanuzzaman
    • 3
  • Sriparna Saha
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology PatnaIndia
  2. 2.Academy of TechnologyKolkataIndia
  3. 3.WBIDCLKolkataIndia

Personalised recommendations