Advertisement

Word Embedding-Based Biomedical Text Summarization

  • Oussama RouaneEmail author
  • Hacene Belhadef
  • Mustapha Bouakkaz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1073)

Abstract

In this paper, we have proposed a novel word embedding-based biomedical text summarizer. Biomedical words are represented by real dense vectors. Sentences are represented by summing-up the word vectors that contain. The PageRank algorithm is applied to rank sentences using the cosine similarity as a distance measure between sentences vectors. The top N highly ranked sentences are selected to build the summary. For the evaluation, we created a corpus of 200 biomedical papers downloaded from the Biomed Central full-text database. We used a pre-trained Word2vec model of word vectors generated from a combination of PubMed, PMC, and recent English Wikipedia dump texts. We compared our method with four other summarizers using: ROUGE-1, ROUGE-2, ROUGE-3, and ROUGE-SU4 metrics by evaluating the generated summaries with the abstracts of papers. Our summarizer achieved an improvement of 3.48%, 7.68%, 9.76%, and 3.47% respectively against the second-ranked summarizer.

Keywords

Biomedical text summarization Word embedding Word2vec PageRank algorithm ROUGE metrics 

References

  1. 1.
    Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. Artif. Intell. Med. 33, 157–177 (2005).  https://doi.org/10.1016/j.artmed.2004.07.017CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Edu, B.B., Ng, A.Y., et al.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).  https://doi.org/10.1162/jmlr.2003.3.4-5.993CrossRefzbMATHGoogle Scholar
  3. 3.
    Brokos, G.-I., Malakasiotis, P., Androutsopoulos, I.: Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering, pp. 114–118 (2016)Google Scholar
  4. 4.
    Deerwester, S., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990).  https://doi.org/10.1017/CBO9781107415324.004CrossRefGoogle Scholar
  5. 5.
    Dzuganova, B.: English medical terminology – different ways of forming medical terms. JAHR – Eur. J. Bioeth. 4, 55–69 (2013)Google Scholar
  6. 6.
    Edmundson, H.P.: New methods in automatic extracting. J. ACM 16, 264–285 (1969).  https://doi.org/10.1145/321510.321519CrossRefzbMATHGoogle Scholar
  7. 7.
    Fan, Q., Fang, Y.: An answer summarization method based on keyword extraction. In: BIO Web Conference, vol. 8, p. 03015 (2017).  https://doi.org/10.1051/bioconf/20170803015
  8. 8.
    Friedman, C., Elhadad, N.: Natural language processing in health care and biomedicine. In: Biomedical Informatics, pp. 255–284. Springer, London (2014)Google Scholar
  9. 9.
    Hovy, E.: Automated text summarization. In: The Oxford Handbook of Computational Linguistics, pp 583–598. Oxford University Press (2005)Google Scholar
  10. 10.
    Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Work Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)Google Scholar
  11. 11.
    Luo, X., Shah, S.: Concept embedding-based weighting scheme for biomedical text clustering and visualization. Appl. Inform. 5 (2018).  https://doi.org/10.1186/s40535-018-0055-8
  12. 12.
    Menéndez, H.D., Plaza, L., Camacho, D.: A genetic graph-based clustering approach to biomedical summarization. In: Proceedings of 3rd International Conference on Web Intelligence, Mining and Semantics – WIMS (2013).  https://doi.org/10.1145/2479787.2479807
  13. 13.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proc EMNLP 85, 404–411 (2004).  https://doi.org/10.3115/1219044.1219064CrossRefGoogle Scholar
  14. 14.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality (2013a)Google Scholar
  15. 15.
    Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic Regularities in continuous space word representations. association for computational linguistics (2013b)Google Scholar
  16. 16.
    Moradi, M., Ghadiri, N.: Quantifying the informativeness for biomedical literature summarization: an itemset mining method. Comput. Methods Program. Biomed. 146, 77–89 (2017).  https://doi.org/10.1016/j.cmpb.2017.05.011CrossRefGoogle Scholar
  17. 17.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1998)Google Scholar
  18. 18.
    Plaza, L., Díaz, A., Gervás, P.: A semantic graph-based approach to biomedical summarisation. Artif. Intell. Med. 53, 1–14 (2011).  https://doi.org/10.1016/j.artmed.2011.06.005CrossRefGoogle Scholar
  19. 19.
    Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004).  https://doi.org/10.1613/JAIR.1523CrossRefGoogle Scholar
  20. 20.
    Radev, D.R., Jing, H., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. Inf. Process. Manag. 40(6), 919–938 (2000).  https://doi.org/10.1016/j.ipm.2003.10.006. 40:10CrossRefGoogle Scholar
  21. 21.
    Reeve, L., Han, H., Brooks, A.D.: BioChain. In: Proceedings of the 2006 ACM Symposium on Applied Computing - SAC 2006, p. 180. ACM Press, New York (2006)Google Scholar
  22. 22.
    Rouane, O., Belhadef, H., Bouakkaz, M.: Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst. Appl. 135 (2019).  https://doi.org/10.1016/j.eswa.2019.06.002
  23. 23.
    Shortliffe, E.H., Cimino, J.J.: Biomedical Informatics: Computer Applications in Health Care and Biomedicine, 4th edn. Springer, London (2014)CrossRefGoogle Scholar
  24. 24.
    Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1 edn. Pearson, London (2006). ISBN-13: 978-0321321367, ISBN-10: 0321321367Google Scholar
  25. 25.
    Yoo, I., Hu, X., Song, I.Y.: A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. BMC Bioinform. 8, 1–15 (2007).  https://doi.org/10.1186/1471-2105-8-S9-S4CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Oussama Rouane
    • 1
    Email author
  • Hacene Belhadef
    • 1
  • Mustapha Bouakkaz
    • 2
  1. 1.University of Constantine 2 - Abdelhamid MehriConstantineAlgeria
  2. 2.Computer Science Department, Faculty of SciencesUniversity of Amar TelidgiLaghouatAlgeria

Personalised recommendations