Advertisement

Text summarization from legal documents: a survey

  • Ambedkar Kanapala
  • Sukomal Pal
  • Rajendra Pamula
Article

Abstract

Enormous amount of online information, available in legal domain, has made legal text processing an important area of research. In this paper, we attempt to survey different text summarization techniques that have taken place in the recent past. We put special emphasis on the issue of legal text summarization, as it is one of the most important areas in legal domain. We start with general introduction to text summarization, briefly touch the recent advances in single and multi-document summarization, and then delve into extraction based legal text summarization. We discuss different datasets and metrics used in summarization and compare performances of different approaches, first in general and then focused to legal text. we also mention highlights of different summarization techniques. We briefly cover a few software tools used in legal text summarization. We finally conclude with some future research directions.

Keywords

Summarization from legal documents Single-document summarization Multi-document summarization Legal domain 

References

  1. Abuobieda A, Salim N, Kumar YJ, Osman AH (2013a) An improved evolutionary algorithm for extractive text summarization. In: Intelligent information and database systems, Springer, pp 78–89Google Scholar
  2. Abuobieda A, Salim N, Kumar YJ, Osman AH (2013b) Opposition differential evolution based method for text summarization. In: Intelligent information and database systems, Springer, pp 487–496Google Scholar
  3. Alliheedi M, Di Marco C (2014) Rhetorical figuration as a metric in text summarization. In: Advances in artificial intelligence, Springer, pp 13–22Google Scholar
  4. Batcha NK, Aziz NA, Shafie SI (2013) Crf based feature extraction applied for supervised automatic text summarization. Proc Technol 11:426–436CrossRefGoogle Scholar
  5. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022MATHGoogle Scholar
  6. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117CrossRefGoogle Scholar
  7. Bun KK, Ishizuka M (2002) Topic extraction from news archive using tf* pdf algorithm. In: International conference on web information systems engineering, IEEE Computer Society, pp 73–73Google Scholar
  8. Cabral LdS, Lins RD, Mello RF, Freitas F, Ávila B, Simske S, Riss M (2014a) A platform for language independent summarization. In: Proceedings of the 2014 ACM symposium on Document engineering, ACM, pp 203–206Google Scholar
  9. Cabral LRL, Lima R, Ferreira R, Freitas F, Silva G, Cavalcanti GeSS, Favaro L (2014b) A hybrid algorithm for automatic language detection on web and text documents. In: 11th IAPR international workshop on document analysis systems, Tours-Loire Valley, FranceGoogle Scholar
  10. Chen J, Zhuge H (2014) Summarization of scientific documents by detecting common facts in citations. Future Gener Comput Syst 32:246–252CrossRefGoogle Scholar
  11. Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383CrossRefGoogle Scholar
  12. Cohan A, Goharian N (2016) Revisiting summarization evaluation for scientific articles. arXiv preprint arXiv:1604.00400
  13. Compton P, Jansen R (1990) Knowledge in context: a strategy for expert system maintenance. Springer, BerlinGoogle Scholar
  14. Das D, Martins AF (2007) A survey on automatic text summarization. Lit Surv Lang Stat II Course CMU 4:192–195Google Scholar
  15. Erkan G, Radev D (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479Google Scholar
  16. Ermakova L (2012) Automatic summary evaluation. rouge modifications. In: VI (RuSSIR2012)Google Scholar
  17. Farzindar A, Lapalme G (2004a) Legal text summarization by exploration of the thematic structures and argumentative roles. In: Text summarization branches out workshop held in conjunction with ACL’2004, pp 27–34, Barcelona, Spain, 25–26 July 2004Google Scholar
  18. Farzindar A, Lapalme G (2004b) Letsum, an automatic legal text summarization system. In: Gorden T (ed) Legal knowledge and information systems, JURIX 2004: the seventeenth annual conference. IOS Press, Amsterdam, pp 11–18Google Scholar
  19. Farzindar A, Lapalme G (2004c) The use of thematic structure and concept indentification for legal text summarization. Computational Linguistics in the North-East (CLiNE 2004), Montréal, Québec, Canada, pp 67–71, Aug 2004Google Scholar
  20. Farzindar A (2005) Résumé automatique de textes juridiques. Ph.D. Thesis, Université de Montréal et Université Paris IV-SorbonneGoogle Scholar
  21. Farzindar A, Hosseiny M Nlptechnologies. http://www.nlptechnologies.ca/en/nlp-technologies-services-ans-solutions, urldate=2016-08-17
  22. Fattah MA, Ren F (2008) Automatic text summarization. World Acad Sci Eng Technol 37:2008Google Scholar
  23. Ferreira R, Freitas F, de Souza Cabral L, Dueire Lins R, Lima R, França G, Simskez SJ, Favaro L (2013a) A four dimension graph model for automatic text summarization. In: 2013 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol. 1, IEEE, pp 389–396Google Scholar
  24. Ferreira R, de Souza Cabral L, Lins RD, e Silva GP, Freitas F, Cavalcanti GD, Lima R, Simske SJ, Favaro L (2013b) Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 40(14):5755–5764CrossRefGoogle Scholar
  25. Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787CrossRefGoogle Scholar
  26. Galgani F, Compton P, Hoffmann A (2012a) Citation based summarisation of legal texts. In: PRICAI 2012: Trends in Artificial Intelligence, Springer, pp 40–52Google Scholar
  27. Galgani F, Compton P, Hoffmann A (2012b) Combining different summarization techniques for legal text. In: Proceedings of the workshop on innovative hybrid approaches to the processing of textual data, Association for Computational Linguistics, pp 115–123Google Scholar
  28. Galgani F, Compton P, Hoffmann A (2014) Hauss: incrementally building a summarizer combining multiple techniques. Int J Hum Comput Stud 72(7):584–605CrossRefGoogle Scholar
  29. García-Hernández RA, Ledeneva Y (2013) Single extractive text summarization based on a genetic algorithm. In: Pattern recognition, Springer, pp 374–383Google Scholar
  30. Gawryjolek J (2009) Automated annotation of rhetorical figures. Master’s thesis, University of WaterlooGoogle Scholar
  31. Ghalehtaki RA, Khotanlou H, Esmaeilpour M (2014) A combinational method of fuzzy, particle swarm optimization and cellular learning automata for text summarization. In: 2014 Iranian conference on intelligent systems (ICIS), IEEE, pp 1–6Google Scholar
  32. Goldstein J (1999) Automatic text summarization of multiple documents. Thesis Proposal. Carnegie Mellon UniversityGoogle Scholar
  33. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 19–25Google Scholar
  34. Gross O, Doucet A, Toivonen H (2014) Document summarization based on word associations. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM, pp 1023–1026Google Scholar
  35. Group ELT fsgmatch. https://files.ifi.uzh.ch/cl/broder/tttdoc/c385.htm, urldate=2016-08-17
  36. Grover C, Matheson C, Mikheev A, Moens M (2000) Lt ttt-a flexible tokenisation tool. In: LRECGoogle Scholar
  37. Grover C, Hachey B, Hughson I, Korycinski C (2003a) Automatic summarisation of legal documents. In: Proceedings of the 9th international conference on Artificial intelligence and law, ACM, pp 243–251Google Scholar
  38. Grover C, Hachey B, Korycinski C (2003b) Summarising legal texts: sentential tense and argumentative roles. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, vol 5, Association for Computational Linguistics, pp 33–40Google Scholar
  39. Grover C, Hachey B, Hughson I et al (2004) The holj corpus: supporting summarisation of legal texts. In: Proceedings of the 5th international workshop on linguistically interpreted corpora (LINC-04)Google Scholar
  40. Gupta V (2014) A language independent hybrid approach for text summarization. In: Emerging trends in computing and communication, Springer, pp 71–77Google Scholar
  41. Hachey B, Grover C (2004a) A rhetorical status classifier for legal text summarisation. In: Proceedings of the ACL-2004 text summarization branches out workshopGoogle Scholar
  42. Hachey B, Grover C (2004b) Sentence classification experiments for legal text summarisation. In: Proceedings of the 17th annual conference on legal knowledge and information systems (Jurix)Google Scholar
  43. Hachey B, Grover C (2005a) Automatic legal text summarisation: experiments with summary structuring. In: Proceedings of the 10th international conference on artificial intelligence and law, ACM, pp 75–84Google Scholar
  44. Hachey B, Grover C (2005b) Sentence extraction for legal text summarisation. In: International joint conference on artificial intelligence, vol. 19, Lawrence Erlbaum Associates Ltd., p 1686Google Scholar
  45. Hachey B, Grover C (2005c) Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM symposium on applied computing, ACM, pp 292–296Google Scholar
  46. Hachey B, Grover C (2006) Extractive summarisation of legal texts. Artif Intell Law 14(4):305–345CrossRefGoogle Scholar
  47. Hamid F, Tarau P (2014) Text summarization as an assistive technology. In: Proceedings of the 7th international conference on pervasive technologies related to assistive environments, ACM, p 60Google Scholar
  48. Hao JK (2012) Memetic algorithms in discrete optimization. In: Handbook of memetic algorithms, Springer, pp 73–94Google Scholar
  49. Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. In: EMNLP, pp 1515–1520Google Scholar
  50. John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 338–345Google Scholar
  51. Kavila SD, Puli V, Raju GP, Bandaru R (2013) An automatic legal document summarization and search using hybrid system. In: Proceedings of the international conference on frontiers of intelligent computing: theory and applications (FICTA), Springer, pp 229–236Google Scholar
  52. Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol. 2, pp 315–320Google Scholar
  53. Kim MY, Xu Y, Goebel R (2013) Summarization of legal texts with high cohesion and automatic compression rate. In: New frontiers in artificial intelligence, Springer, pp 190–204Google Scholar
  54. Kipper K, Dang HT, Palmer M et al (2000) Class-based construction of a verb lexicon. In: AAAI/IAAI, pp 691–696Google Scholar
  55. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632MathSciNetCrossRefMATHGoogle Scholar
  56. Krishna R, Kumar SP, Reddy CS (2013) A hybrid method for query based automatic summarization system. Int J Comput Appl 68:39–43Google Scholar
  57. Kumar R, Raghuveer K (2012) Legal document summarization using latent dirichlet allocation. Int J Comput Sci Telecommun 3:114–117Google Scholar
  58. Kumar YJ, Salim N, Abuobieda A, Albaham AT (2014) Multi document summarization based on news components using fuzzy cross-document relations. Appl Soft Comput 21:265–279CrossRefGoogle Scholar
  59. Ledeneva Y, García-Hernández RA, Gelbukh A (2014) Graph ranking on maximal frequent sequences for single extractive text summarization. In: computational linguistics and intelligent text processing, Springer, pp 466–480Google Scholar
  60. Lee S, Kim HJ (2008) News keyword extraction for topic tracking. In: Fourth international conference on networked computing and advanced information management, 2008, NCM’08, vol. 2, IEEE, pp 554–559Google Scholar
  61. Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic. In: Machine learning and data mining in pattern recognition, Springer, pp 159–168Google Scholar
  62. Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, vol. 8. Barcelona, SpainGoogle Scholar
  63. Littlestone N (1987) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In: 1987 28th annual symposium on foundations of computer science, IEEE, pp 68–77Google Scholar
  64. Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41CrossRefGoogle Scholar
  65. Ma Y, Wu J (2014) Combining n-gram and dependency word pair for multi-document summarization. In: 2014 IEEE 17th international conference on computational science and engineering (CSE), IEEE, pp 27–31Google Scholar
  66. Mailhot L, Carnwath JD (1998) Decisions, Decisions-: a handbook for judicial writing. Cowansville, Québec: Éditions Y. BlaisGoogle Scholar
  67. Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330Google Scholar
  68. Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169CrossRefGoogle Scholar
  69. Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. Association for Computational LinguisticsGoogle Scholar
  70. Mikheev A (1997) Automatic rule induction for unknown-word guessing. Comput Linguist 23(3):405–423Google Scholar
  71. Miranda-Jiménez S, Gelbukh A, Sidorov G (2013) Summarizing conceptual graphs for automatic summarization task. In: Conceptual structures for STEM research and education, Springer, pp 245–253Google Scholar
  72. Nenkova A, McKeown K (2012) A survey of text summarization techniques. In: Mining text data, Springer, pp 43–76Google Scholar
  73. Pal AR, Saha D (2014) An approach to automatic text summarization using wordnet. In: 2014 IEEE International advance computing conference (IACC), IEEE, pp 1169–1173Google Scholar
  74. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines, Technical Report MSR-TR-98-14. Microsoft, ResearchGoogle Scholar
  75. Plaza L (2014) Comparing different knowledge sources for the automatic summarization of biomedical literature. J Biomed Inf 52:319–328CrossRefGoogle Scholar
  76. Press Information Bureau, G.o.I.: cases pending in high courts and supreme court. http://pib.nic.in/newsite/erelease.aspx?relid=73624, urldate=2015-07-10
  77. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, New YorkGoogle Scholar
  78. Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—A platform for multidocument multilingual text summarization. In: Conference on Language Resources and Evaluation (LREC). Lisbon, Portugal, May 2004Google Scholar
  79. Samei B, Samei B, Estiagh M, Eshtiagh M, Keshtkar F, Hashemi S, Hashemi S (2014) Multi-document summarization using graph-based iterative ranking algorithms and information theoretical distortion measures. In: The Twenty-seventh international flairs conferenceGoogle Scholar
  80. Saravanan M, Ravindran B (2010) Identification of rhetorical roles for segmentation and summarization of a legal judgment. Artif Intell Law 18(1):45–76CrossRefGoogle Scholar
  81. Saravanan M, Ravindran B, Raman S (2006) Improving legal document summarization using graphical models. Front Artif Intell Appl 152:51Google Scholar
  82. Saravanan M, Ravindran B, Raman S (2008) Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: Proceedings of the third international joint conference on natural language processing, IJCNLP 2008, Hyderabad, pp 51–60Google Scholar
  83. Schilder F, Molina-Salgado H (2006) Evaluating a summarizer for legal text with a large text collection. In: 3rd Midwestern computational linguistics colloquium (MCLC). CiteseerGoogle Scholar
  84. Sharma AD, Deep S (2014) Too long-didn‘t read a practical web based approach towards text summarization. In: Applied Algorithms, Springer, pp 198–208Google Scholar
  85. Sivanandam S, Deepa S (2007) Introduction to genetic algorithms. Springer, BerlinMATHGoogle Scholar
  86. Smith J, Deedman C (1987) The application of expert systems technology to case-based law. In: ICAIL, vol. 87, pp 84–93Google Scholar
  87. Sowa JF (1984) Conceptual structures: information processing in mind and machine. Addison Wesley, Reading, MAGoogle Scholar
  88. Sparck-Jones K (1999) Automatic summarizing: factors and directions. In: Mani I, Maybury M (eds) Advances in Automatic Text Summarization. The MIT Press, pp 1–12Google Scholar
  89. Teufel S, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the ACL, vol. 97, pp 58–65Google Scholar
  90. Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Linguist 28(4):409–445CrossRefGoogle Scholar
  91. Turtle H (1995) Text retrieval in the legal world. Artif Intell Law 3(1–2):5–54CrossRefGoogle Scholar
  92. Uyttendaele C, Moens MF, Dumortier J (1998) Salomon: automatic abstracting of legal cases for effective access to court decisions. Artif Intell Law 6(1):59–79CrossRefGoogle Scholar
  93. Vodolazova T, Lloret E, Muñoz R, Palomar M (2013) The role of statistical and semantic features in single-document extractive summarization. Artif Intell Res 2(3):35CrossRefGoogle Scholar
  94. Wang Y, Ma J (2013) A comprehensive method for text summarization based on latent semantic analysis. In: natural language processing and chinese computing, Springer, pp 394–401Google Scholar
  95. Wang T, Chen P, Simovici D (2016) A new evaluation measure using compression dissimilarity on text summarization. Appl Intell 45(1):127–134CrossRefGoogle Scholar
  96. wikipedia: district_courts, Legal Domain. http://en.wikipedia.org/wiki/List_of_district_courts_of_India, urldate=2015-07-10
  97. wikipedia: High_courts, legal domain. http://en.wikipedia.org/wiki/List_of_High_Courts_of_India, urldate=2015-07-10
  98. Yousfi-Monod M, Farzindar A, Lapalme G (2010) Supervised machine learning for summarizing legal documents. In: Advances in artificial intelligence, Springer, pp 51–62Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  • Ambedkar Kanapala
    • 1
  • Sukomal Pal
    • 2
  • Rajendra Pamula
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology (ISM), DhanbadDhanbadIndia
  2. 2.Department of Computer Science and EngineeringIndian Institute of Technology (BHU), VaranasiVaranasiIndia

Personalised recommendations