Exploring Domain-Sensitive Features for Extractive Summarization in the Medical Domain

  • Dat Tien Nguyen
  • Johannes Leveling
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7934)


This paper describes experiments to adapt document summarization to the medical domain. Our summarizer combines linguistic features corresponding to text fragments (typically sentences) and applies a machine learning approach to extract the most important text fragments from a document to form a summary. The generic features comprise features used in previous research on summarization. We propose to adapt the summarizer to the medical domain by adding domain-specific features. We explore two types of additional features: medical domain features and semantic features. The evaluation of the summarizer is based on medical articles and targets different aspects: i) the classification of text fragments into ones which are important and ones which are unimportant for a summary; ii) analyzing the effect of each feature on the performance; and iii) system improvement over our baseline summarizer when adding features for domain adaptation. Evaluation metrics include accuracy for training the sentence extraction and the ROUGE measure computed for reference summaries. We achieve an accuracy of 84.16% on medical balanced training data by using an IB1 classifier. Training on unbalanced data achieves higher accuracy than training on balanced data. Domain adaptation using all domain-specific features outperforms the baseline summarization wrt. ROUGE scores, which shows the successful domain adaptation with simple means.


Automatic Summarization Sentence Extraction Machine Learning ROUGE 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hovy, E., Lin, C.Y.: Automated text summarization in SUMMARIST. In: Mani, I., Maybury, M.T. (eds.) Advances in Automatic Text Summarization. MIT Press (1999)Google Scholar
  2. 2.
    Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM 16(2), 264–285 (1969)zbMATHCrossRefGoogle Scholar
  4. 4.
    Paice, C.D.: The automatic generation of literature abstracts: An approach based on the identification of self-indicating phrases. In: SIGIR 1981, pp. 172–191 (1981)Google Scholar
  5. 5.
    Nenkova, A., McKeown, K.: Foundations and trends in information retrieval. Automatic Summarization 5, 103–233 (2011)Google Scholar
  6. 6.
    Das, D., Martins, A.F.: A survey on automatic text summarization. Technical report, Literature Survey for the Language and Statistics II course at Carnegie Mellon University (2007)Google Scholar
  7. 7.
    Nenkova, A.: Automatic text summarization of newswire: lessons learned from the document understanding conference. In: AAAI 2005, pp. 1436–1441. AAAI Press (2005)Google Scholar
  8. 8.
    Conroy, J.M., O’Leary, D.P.: Text summarization via Hidden Markov Models. In: SIGIR 2001, pp. 406–407 (2001)Google Scholar
  9. 9.
    Lin, C.Y.: Training a selection function for extraction. In: Proceedings of the Eighth International Conference on Information and Knowledge Management, CIKM 1999, pp. 55–62. ACM, New York (1999)CrossRefGoogle Scholar
  10. 10.
    Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73 (1995)Google Scholar
  11. 11.
    McKeown, K., Chang, S.F., Cimino, J., Feiner, S., Friedman, C., Gravano, L., Hatzivassiloglou, V., Johnson, S., Jordan, D., Klavans, J., Kushniruk, A., Patel, V., Teufel, S.: PERSIVAL, a system for personalized search and summarization over multimedia healthcare information. In: ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 331–340 (2001)Google Scholar
  12. 12.
    Yang, J., Cohen, A., Hersh, W.: Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts. In: AMIA Annual Symposium, pp. 831–835 (2007)Google Scholar
  13. 13.
    Gupta, V., Lehal, G.: A survey of text summarization extractive techniques. Journal of Emerging Technologies in Web Intelligence 2(3) (2010)Google Scholar
  14. 14.
    Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summarization. Inf. Process. Manage. 33(2), 193–207 (1997)CrossRefGoogle Scholar
  15. 15.
    Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. Inf. Process. Manage. 31(5), 675–685 (1995)CrossRefGoogle Scholar
  16. 16.
    Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. MMIES 2008, pp. 17–24. Association for Computational Linguistics, Stroudsburg (2008)CrossRefGoogle Scholar
  17. 17.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  18. 18.
    Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using wordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  19. 19.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Shavlik, J.W., Shavlik, J.W. (eds.) ICML, pp. 296–304. Morgan Kaufmann (1998)Google Scholar
  20. 20.
    Plaza, L., Díaz, A., Gervás, P.: Automatic summarization of news using Wordnet concept graphs. In: Proceedings of the IADIS International Conference Informatics, pp. 19–26 (2009)Google Scholar
  21. 21.
    Fattah, M.A., Ren, F.: Ga, mr, ffnn, pnn and gmm based models for automatic text summarization. Computer Speech & Language 23(1), 126–144 (2009)CrossRefGoogle Scholar
  22. 22.
    Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proc. ACL Workshop on Text Summarization Branches Out, p. 10 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Dat Tien Nguyen
    • 1
  • Johannes Leveling
    • 2
  1. 1.University of Engineering and Technology (UET), Vietnam National UniversityHanoiVietnam
  2. 2.Centre for Next Generation Localisation (CNGL), School of ComputingDublin City UniversityDublin 9Ireland

Personalised recommendations