Section Identification to Improve Information Extraction from Chinese Medical Literature

  • Sijia ZhouEmail author
  • Xin Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10983)


The Chinese medical literature contains a large amount of knowledge. Reducing the effort needed by medical scholars to extract this knowledge requires a literature analysis to identify the key information in each paper. We argue that identifying the sections of a paper would help us filter noise from the paper and increase the accuracy of extracting the experimental findings. In this research in progress, we consider paper section identification as a sentence classification task and apply Conditional Random Fields (CRFs) to tackle the problem. In our model we combine both lexical and structural features to facilitate section identification. Experiments on a human-curated asthma dataset show that our approach achieves a 10%–20% performance improvement over Support Vector Machines (SVMs), and that use of both bag-of-words features and domain lexicons benefit the task.


Section identification Sentence classification Chinese medicine 



The research is partially supported by Digital Innovation Lab at City University of Hong Kong, GuangDong Science and Technology Project 2014A020221090, and the City University of Hong Kong Shenzhen Research Institute.


  1. 1.
    Li, X., Tong, Y., Wang, W.: MedC: a literature analysis system for chinese medicine research. In: Zheng, X., Zeng, D.D., Chen, H., Leischow, S.J. (eds.) ICSH 2015. LNCS, vol. 9545, pp. 311–320. Springer, Cham (2016). Scholar
  2. 2.
    Ito, T., Shimbo, M., Yamasaki, T., Matsumoto, Y.: Semi-supervised sentence classification for MEDLINE documents. Methods 138, 141–146 (2004)Google Scholar
  3. 3.
    Zhao, J., Liu, K., Wang, G.: Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 117–126. Association for Computational Linguistics (2008)Google Scholar
  4. 4.
    Naughton, M., Stokes, N., Carthy, J.: Sentence-level event classification in unstructured texts. Inf. Retr. 13, 132–156 (2010). Scholar
  5. 5.
    Kim, S.N., Martinez, D., Cavedon, L.: Automatic classification of sentences for evidence based medicine. In: Proceedings of the ACM Fourth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 13–22 (2010)Google Scholar
  6. 6.
    Lui, M.: Feature stacking for sentence classification in evidence-based medicine. In: Proceedings of the Australasian Language Technology Association Workshop 2012, pp. 134–138 (2012)Google Scholar
  7. 7.
    Angrosh, M.A., Cranefield, S., Stanger, N.: Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 293–302. ACM (2010)Google Scholar
  8. 8.
    Hachey, B., Grover, C.: Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–296 (2005)Google Scholar
  9. 9.
    Kim, Y.: Convolutional neural networks for sentence classification (2014)Google Scholar
  10. 10.
    Chung, G.Y.: Sentence retrieval for abstracts of randomized controlled trials. BMC Med. Inform. Decis. Mak. 9, 1–13 (2009). Scholar
  11. 11.
    Demner-Fushman, D., Lin, J.: Answering clinical questions with knowledge-based and statistical techniques. Comput. Linguist. 33, 63–103 (2007)CrossRefGoogle Scholar
  12. 12.
    Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to statistical relational learning. MIT Press (2006)Google Scholar
  13. 13.
    McKnight, L., Srinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA Annual Symposium Proceedings, pp. 440–444. American Medical Informatics Association (2003)Google Scholar
  14. 14.
    Yamamoto, Y., Takagi, T.: A sentence classification system for multi biomedical literature summarization. In: Proceedings of the 21st International Conference on Data Engineering, pp. 1163–1168 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Information SystemsCity University of Hong KongKowloonHong Kong
  2. 2.Shenzhen Research InstituteCity University of Hong KongShenzhenChina

Personalised recommendations