Skip to main content

Advertisement

Log in

A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Text summarization is the process of producing a shorter version of a specific text. Automatic summarization techniques have been applied to various domains such as medical, political, news, and legal domains proving that adapting domain-relevant features could improve the summarization performance. Despite the existence of plenty of research work in the domain-based summarization in English and other languages, there is a lack of such work in Arabic due to the shortage of existing knowledge bases. In this paper, a hybrid, single-document text summarization approach (abbreviated as (ASDKGA)) is presented. The approach incorporates domain knowledge, statistical features, and genetic algorithms to extract important points of Arabic political documents. The ASDKGA approach is tested on two corpora KALIMAT corpus and Essex Arabic Summaries Corpus (EASC). The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) framework was used to compare the automatically generated summaries by the ASDKGA approach with summaries generated by humans. Also, the approach is compared against three other Arabic text summarization approaches. The (ASDKGA) approach demonstrated promising results when summarizing Arabic political documents with average F-measure of 0.605 at the compression ratio of 40%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Lloret E, Palomar M. Text summarization in progress: a literature review. Artif Intell Rev. 2010;37(1):1–41.

    Article  Google Scholar 

  2. Radev D, Hovy E, McKeown K. Introduction to the special issue on summarization. Comput linguist. 2002;28(4):399–408.

    Article  Google Scholar 

  3. Ježek, K. and Steinberger, J. Automatic text summarization (the state of the Art 2007 and new challenges). In: the conference Znalosti, Bratislava, Slovakia 2008; p 1–12.

  4. Saggion H. Automatic summarization: an overview. Rev Fr Linguist Appl. 2008;13(1):63–81.

    Google Scholar 

  5. Luhn H. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.

    Article  Google Scholar 

  6. Reeve L, Han H, Brooks A. The use of domain-specific concepts in biomedical text summarization. Inf Process Manag. 2007;43(6):1765–76.

    Article  Google Scholar 

  7. Chen Y, Foong O, Yong S, Kurniawan I. Text summarization for oil and gas drilling topic. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2008;2(6):1799–802.

    Google Scholar 

  8. Yeh J, Ke H, Yang W, Meng I. Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag. 2005;41(1):75–95.

    Article  Google Scholar 

  9. Moens, M., Uyttendaele, C., and Dumortier, J. Abstracting of legal cases: the SALOMON experience. In: the 6th International Conference on Artificial Intelligence and Law (ICAIL97), Melbourne, Australia. 1997; p 114–122.

  10. De Hollander, G. and Marx, M. Summarization of meetings using word clouds. In: the Computer Science and Software Engineering (CSSE) CSI International Symposium, Tehran 2011; p 54–61.

  11. Summers, E. and Stephens, K. Politwitics: summarization of political tweets. 2012. Retrieved Mar. 10, 2015 from the World Wide Web: http://bid.berkeley.edu/cs294-1-spring13/images/3/34/Politwitics_report.pdf.

  12. Chong L, Chen Y. Text summarization for oil and gas news article. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2009;3(5):1282–5.

    Google Scholar 

  13. Sarkar K. Using domain knowledge for text summarization in medical domain. Int J Recent Trends Eng. 2009;1(1):200–5.

    Google Scholar 

  14. Imam I, Hamouda A, Khalek H. An ontology-based summarization system for Arabic documents (OSSAD). Int J Comput Appl. 2013;74(17):38–43.

    Google Scholar 

  15. Jr S, Pappa C, Freitas A, Kaestner C. Automatic text summarization with genetic algorithm-based attribute selection. Adv Artif Intell–IBERAMIA Springer. 2004:305–14.

  16. Qazvinian V, Hassanabadi L, Halavati R. Summarising text with a genetic algorithm-based sentence extraction. Int J Knowl Manag Stud. 2008;2(4):426–44.

    Article  Google Scholar 

  17. Fattah M, Ren F. Automatic text summarization. Int J Comput Electr Autom Control Inf Eng. 2008;2(1):90–3.

    Google Scholar 

  18. Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using genetic algorithms. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden; 2010. p. 927–36.

    Google Scholar 

  19. Nandhini K, Balasundaram S. Use of genetic algorithms for cohesive summary extraction to assist reading difficulties. Appl Comput Intell Soft Comput. 2013;2013:1–11.

    Article  Google Scholar 

  20. Hammo B, Abu-Salem H, Evens M. A hybrid Arabic text summarization technique based on text structure and topic identification. Int J Comput Process Lang. 2011;23(01):39–65.

    Article  Google Scholar 

  21. Al-Omour M. Extractive-based Arabic text summarization approach. M.Sc Thesis: Department of Computer Science, Yarmouk University, Irbid, Jordan; 2012.

    Google Scholar 

  22. Ibrahim A, Elghazaly T, Gheith M. A novel Arabic text summarization model based on rhetorical structure theory and vector space model. Int J Comput Linguist Nat Lang Process. 2013;2(8):480–4.

    Google Scholar 

  23. Douzidia, F. and Lapalme, G. Lakhas, an Arabic summarization system. In: the Document Understanding Conference (DUC), Boston, USA. 2004; p128–135.

  24. Bawakid, A., and Oussalah, M. A semantic summarization system: the University of Birmingham at TAC 2008. In: the first text analysis conference (TAC), Maryland, USA 2008; p 1–6.

  25. Al-Radaideh Q, Afif M. Arabic text summarization using aggregate similarity. In: The international Arab Conference on Information Technology (ACIT’2009). Yemen; 2009. p. 1–8.

    Google Scholar 

  26. Sobh I. An optimized dual classification system for Arabic extractive generic text summarization. M.Sc Thesis: Department of Computer Engineering, Cairo University, Giza, Egypt; 2009.

    Google Scholar 

  27. Hamodeh, A. and Mousa, M. Automatic system for summarizing Arabic comments on social media networks. Al-Majala Al-Dawlia Lelitesalat, Al-Jameia Al-Arabia Lelhasibat. Special Issue. 2013; p 44–56. (In Arabic).

  28. Al-Taani Ahmad and Al-Rousan, Suhaib. Arabic multi-document text summarization. In: the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Turkey 2016.

  29. Oufaida H, Nouali O, Blache. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J King Saud Univ-Comput Inf Sci. 2014;26(4):450–61.

    Google Scholar 

  30. Al-Khawaldeh F, Samawi V. Lexical cohesion and entailment-based segmentation for Arabic text summarization (LCEAS). World Comput Sci Inf Technol J (WCSIT). 2015;5(03):51–60.

    Google Scholar 

  31. Tran HN, Cambria E, Hussain A. Towards GPU-based common-sense reasoning: using fast subgraph matching. Cogn Comput. 2016;8(6):1074–86.

    Article  Google Scholar 

  32. Yunqing Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using Bayesian model and opinion-level features. Cogn Comput. 2015;7(3):369–80.

    Article  Google Scholar 

  33. Li Y, Pan Q, Yang T, Suhang Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cogn Comput. 2017;9(6):843–51.

    Article  Google Scholar 

  34. Al-Radaideh Q, Gh A-Q. Application of rough set-based feature selection for Arabic sentiment analysis. Cogn Comput. 2017;9(4):346–445.

    Article  Google Scholar 

  35. Recupero D, Presutti V, Consoli S, Gangemi A, Nuzzolese A. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–25.

    Article  Google Scholar 

  36. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah A, Gelbukh A, et al. Multilingual sentiment analysis: state-of-the-art and independent comparison of techniques. Cogn Comput. 2016;8:757–71.

    Article  Google Scholar 

  37. Mukhtar N, Khan MA, Chiragh N. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput. 2017;9(4):446–56.

    Article  Google Scholar 

  38. Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev. 2017;48(4):499–527.

    Article  Google Scholar 

  39. Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci. 2014;40(4):501–13.

    Article  Google Scholar 

  40. El-Khair I. Effects of stop words elimination for Arabic information retrieval: a comparative study. Int J Comput Inf Sci. 2006;4(3):119–33.

    Google Scholar 

  41. Green, S. and Manning, C. Better arabic parsing: baselines, evaluations, and analysis. In: the 23rd International Conference on Computational Linguistics (COLING), Beijing, China. 2010; p 394–402.

  42. Mustafa S. Word stemming for Arabic information retrieval: the case for simple light stemming. Abhath Al-Yarmouk: Sci Eng Ser. 2012;21(1):123–44.

    Google Scholar 

  43. Singh J, Gupta V. An efficient corpus-based stemmer. Cogn Comput. 2017;9(5):671–88.

    Article  Google Scholar 

  44. Edmundson H. New methods in automatic extracting. J Assoc Comput Mach. 1969;16(2):264–85.

    Article  Google Scholar 

  45. Perumal K, Chaudhuri B. Language independent sentence extraction based text summarization. In: The 9th international conference on natural language processing (ICON), Chennai, India; 2011. p. 213–7.

    Google Scholar 

  46. Kumar Y, Salim N. Automatic multi document summarization approaches. J Comput Sci. 2011;8(1):133–40.

    Article  Google Scholar 

  47. Gupta V, Lehal G. A Survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2(3):258–68.

    Google Scholar 

  48. Miller B, Goldberg D. Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 1995;9(3):193–212.

    Google Scholar 

  49. El-Haj, M. and Koulali, R. KALIMAT: a multipurpose Arabic corpus. In the Second Workshop on Arabic Corpus Linguistics, Lancaster University, UK. 2011b; p 22–25. http://sourceforge.net/projects/kalimat/.

  50. El-Haj M., Kruschwitz U., and Fox C. Using mechanical Turk to create a corpus of Arabic summaries. In: The 7th international language resources and evaluation conference (LREC), Valletta, Malta. 2010; p 36–39.

  51. Lin, C. ROUGE: a package for automatic evaluation of summaries. In: the ACL Workshop on Text Summarization Branches out, Barcelona, Spain. 2004; p 74–81.

  52. El-Haj M, Kruschwitz U, Fox C. Experimenting with automatic text summarisation for Arabic. Hum Lang Technol Chall Comput Sci Linguist Springer. 2011a:490–9.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qasem A. Al-Radaideh.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki declaration of 1975, as revised in 2008 [15].

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Radaideh, Q.A., Bataineh, D.Q. A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms. Cogn Comput 10, 651–669 (2018). https://doi.org/10.1007/s12559-018-9547-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-018-9547-z

Keywords

Navigation