Section Identification to Improve Information Extraction from Chinese Medical Literature
The Chinese medical literature contains a large amount of knowledge. Reducing the effort needed by medical scholars to extract this knowledge requires a literature analysis to identify the key information in each paper. We argue that identifying the sections of a paper would help us filter noise from the paper and increase the accuracy of extracting the experimental findings. In this research in progress, we consider paper section identification as a sentence classification task and apply Conditional Random Fields (CRFs) to tackle the problem. In our model we combine both lexical and structural features to facilitate section identification. Experiments on a human-curated asthma dataset show that our approach achieves a 10%–20% performance improvement over Support Vector Machines (SVMs), and that use of both bag-of-words features and domain lexicons benefit the task.
KeywordsSection identification Sentence classification Chinese medicine
The research is partially supported by Digital Innovation Lab at City University of Hong Kong, GuangDong Science and Technology Project 2014A020221090, and the City University of Hong Kong Shenzhen Research Institute.
- 2.Ito, T., Shimbo, M., Yamasaki, T., Matsumoto, Y.: Semi-supervised sentence classification for MEDLINE documents. Methods 138, 141–146 (2004)Google Scholar
- 3.Zhao, J., Liu, K., Wang, G.: Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 117–126. Association for Computational Linguistics (2008)Google Scholar
- 5.Kim, S.N., Martinez, D., Cavedon, L.: Automatic classification of sentences for evidence based medicine. In: Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 13–22 (2010)Google Scholar
- 6.Lui, M.: Feature stacking for sentence classification in evidence-based medicine. In: Proceedings of the Australasian Language Technology Association Workshop 2012, pp. 134–138 (2012)Google Scholar
- 7.Angrosh, M.A., Cranefield, S., Stanger, N.: Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 293–302. ACM (2010)Google Scholar
- 8.Hachey, B., Grover, C.: Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–296 (2005)Google Scholar
- 9.Kim, Y.: Convolutional neural networks for sentence classification (2014)Google Scholar
- 12.Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to statistical relational learning. MIT Press (2006)Google Scholar
- 13.McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA Annual Symposium Proceedings, pp. 440–444. American Medical Informatics Association (2003)Google Scholar
- 14.Yamamoto, Y., Takagi, T.: A sentence classification system for multi biomedical literature summarization. In: Proceedings of the 21st International Conference on Data Engineering, pp. 1163–1168 (2005)Google Scholar