Skip to main content

Legal Data Mining from Civil Judgments

  • Conference paper
  • First Online:
Intelligent Technologies and Applications (INTAP 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 932))

Included in the following conference series:

Abstract

Due to advent of computing, content digitization and its processing is being widely performed across the globe. Legal domain is amongst many of those areas that provide various opportunities for innovation and betterment by means of computational advancements. In Pakistan, since last couple of years, courts have been reporting judgments for public consumption. This reported data is of great importance for judges, lawyers and civilians in various aspects. As this data is growing at rapid rate, there is dire need to process this huge amount of data to better address the need of respective stakeholders. Therefore, in this study, our aim is to develop a machine learning system that can automatically extract information out of public reported judgments of Lahore High Court. This information, once extracted, can be utilized in betterment for society and policy making in Pakistan. This study takes the first step to achieve this goal by means of extracting various entities from legal judgments. Total ten entities are being extracted that include dates, case numbers, reference cases, person names, respondent names etc. In order to automatically extract these entities, primary requirement was to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for entity extraction. Finally, various algorithms including Markov models and Conditional Random Fields are applied on annotated dataset. Experiments show that these approaches achieve reasonable well results for legal data extraction. Primary contribution of this study is development of annotated dataset on civil judgments followed by training of various machine learning models to extract the potential information from a judgment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Farzindar, A., Lapalme, G.: Legal text summarization by exploration of the thematic structure and argumentative roles. In: Text summarization branches out workshop held in conjunction with ACL, pp. 27–34 (2004)

    Google Scholar 

  2. Grover, C., Hachey, B., Korycinski, C.: Summarising legal texts: sentential tense and argumentative roles. In: Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, Stroudsburg, PA, USA, vol. 5, pp. 33–40 (2003)

    Google Scholar 

  3. Raghav, K., Balakrishna Reddy, P., Balakista Reddy, V., Krishna Reddy, P.: Text and citations based cluster analysis of legal judgments. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 449–459. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_42

    Google Scholar 

  4. Chou, S., Hsing, T.-P.: Text mining technique for chinese written judgment of criminal case. In: Chen, H., Chau, M., Li, S., Urs, S., Srinivasa, S., Wang, G.A. (eds.) PAISI 2010. LNCS, vol. 6122, pp. 113–125. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13601-6_14

    Google Scholar 

  5. Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 435–444. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_50

    Google Scholar 

  6. Opsomer, R., De Meyer, G., Cornelis, C., Van Eetvelde, G.: Exploiting properties of legislative texts to improve classification accuracy. In: Proceedings of the 2009 Conference on Legal Knowledge and Information Systems, JURIX 2009: The Twenty-Second Annual Conference, Amsterdam, The Netherlands, pp. 136–145 (2009)

    Google Scholar 

  7. Galgani, F., Compton, P., Hoffmann, A.: Combining different summarization techniques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pp. 115–123 (2012)

    Google Scholar 

  8. Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., Wudali, R.: Named entity recognition and resolution in legal text. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_2

    Google Scholar 

  9. Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., Montemagni, S.: NLP-based metadata extraction for legal text consolidation. In: Proceedings of the 12th International Conference on Artificial Intelligence and Law, New York, NY, USA, pp. 40–49 (2009)

    Google Scholar 

  10. Palmirani, M., Brighi, R., Massini, M.: Automated extraction of normative references in legal texts. In: Proceedings of the 9th International Conference on Artificial Intelligence and Law, pp. 105–106 (2003)

    Google Scholar 

  11. Poudyal, P., Borrego, L., Quaresma, P.: Using machine learning algorithms to identify named entities in legal documents: a preliminary approach. Esc. Ciênc. E Tecnol. Universidade Évora (2011)

    Google Scholar 

  12. Jungiewicz, M., Łopuszyński, M.: Unsupervised keyword extraction from polish legal texts. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 65–70. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10888-9_7

    Google Scholar 

  13. Bruckschen, M., et al.: Named entity recognition in the legal domain for ontology population. In: Workshop Programme, p. 16 (2010)

    Google Scholar 

  14. Boella, G., Di Caro, L., Robaldo, L.: Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: Morgenstern, L., Stefaneas, P., Lévy, F., Wyner, A., Paschke, A. (eds.) RuleML 2013. LNCS, vol. 8035, pp. 218–225. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39617-5_20

    Google Scholar 

  15. Bui, T.D., Ho, Q.B.: An approach for automatically structuring vietnamese legal text. In: International Conference on Asian Language Processing (IALP), pp. 187–190 (2014)

    Google Scholar 

  16. Shehri – Pakistan (2017). http://shehripakistan.com/. Accessed 05 July 2018

  17. Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)

    Google Scholar 

  18. McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy markov models for information extraction and segmentation. In: ICML, vol. 17, pp. 591–598 (2000)

    Google Scholar 

  19. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 282–289 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zara Nasar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharafat, S., Nasar, Z., Jaffry, S.W. (2019). Legal Data Mining from Civil Judgments. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6052-7_37

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6051-0

  • Online ISBN: 978-981-13-6052-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics