Legal Data Mining from Civil Judgments

Sharafat, Shahmin; Nasar, Zara; Jaffry, Syed Waqar

doi:10.1007/978-981-13-6052-7_37

Shahmin Sharafat¹¹,
Zara Nasar¹¹ &
Syed Waqar Jaffry¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 932))

Included in the following conference series:

International Conference on Intelligent Technologies and Applications

1535 Accesses
4 Citations

Abstract

Due to advent of computing, content digitization and its processing is being widely performed across the globe. Legal domain is amongst many of those areas that provide various opportunities for innovation and betterment by means of computational advancements. In Pakistan, since last couple of years, courts have been reporting judgments for public consumption. This reported data is of great importance for judges, lawyers and civilians in various aspects. As this data is growing at rapid rate, there is dire need to process this huge amount of data to better address the need of respective stakeholders. Therefore, in this study, our aim is to develop a machine learning system that can automatically extract information out of public reported judgments of Lahore High Court. This information, once extracted, can be utilized in betterment for society and policy making in Pakistan. This study takes the first step to achieve this goal by means of extracting various entities from legal judgments. Total ten entities are being extracted that include dates, case numbers, reference cases, person names, respondent names etc. In order to automatically extract these entities, primary requirement was to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for entity extraction. Finally, various algorithms including Markov models and Conditional Random Fields are applied on annotated dataset. Experiments show that these approaches achieve reasonable well results for legal data extraction. Primary contribution of this study is development of annotated dataset on civil judgments followed by training of various machine learning models to extract the potential information from a judgment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Farzindar, A., Lapalme, G.: Legal text summarization by exploration of the thematic structure and argumentative roles. In: Text summarization branches out workshop held in conjunction with ACL, pp. 27–34 (2004)
Google Scholar
Grover, C., Hachey, B., Korycinski, C.: Summarising legal texts: sentential tense and argumentative roles. In: Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, Stroudsburg, PA, USA, vol. 5, pp. 33–40 (2003)
Google Scholar
Raghav, K., Balakrishna Reddy, P., Balakista Reddy, V., Krishna Reddy, P.: Text and citations based cluster analysis of legal judgments. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 449–459. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_42
Google Scholar
Chou, S., Hsing, T.-P.: Text mining technique for chinese written judgment of criminal case. In: Chen, H., Chau, M., Li, S., Urs, S., Srinivasa, S., Wang, G.A. (eds.) PAISI 2010. LNCS, vol. 6122, pp. 113–125. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13601-6_14
Google Scholar
Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 435–444. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_50
Google Scholar
Opsomer, R., De Meyer, G., Cornelis, C., Van Eetvelde, G.: Exploiting properties of legislative texts to improve classification accuracy. In: Proceedings of the 2009 Conference on Legal Knowledge and Information Systems, JURIX 2009: The Twenty-Second Annual Conference, Amsterdam, The Netherlands, pp. 136–145 (2009)
Google Scholar
Galgani, F., Compton, P., Hoffmann, A.: Combining different summarization techniques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pp. 115–123 (2012)
Google Scholar
Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., Wudali, R.: Named entity recognition and resolution in legal text. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_2
Google Scholar
Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., Montemagni, S.: NLP-based metadata extraction for legal text consolidation. In: Proceedings of the 12th International Conference on Artificial Intelligence and Law, New York, NY, USA, pp. 40–49 (2009)
Google Scholar
Palmirani, M., Brighi, R., Massini, M.: Automated extraction of normative references in legal texts. In: Proceedings of the 9th International Conference on Artificial Intelligence and Law, pp. 105–106 (2003)
Google Scholar
Poudyal, P., Borrego, L., Quaresma, P.: Using machine learning algorithms to identify named entities in legal documents: a preliminary approach. Esc. Ciênc. E Tecnol. Universidade Évora (2011)
Google Scholar
Jungiewicz, M., Łopuszyński, M.: Unsupervised keyword extraction from polish legal texts. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 65–70. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10888-9_7
Google Scholar
Bruckschen, M., et al.: Named entity recognition in the legal domain for ontology population. In: Workshop Programme, p. 16 (2010)
Google Scholar
Boella, G., Di Caro, L., Robaldo, L.: Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: Morgenstern, L., Stefaneas, P., Lévy, F., Wyner, A., Paschke, A. (eds.) RuleML 2013. LNCS, vol. 8035, pp. 218–225. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39617-5_20
Google Scholar
Bui, T.D., Ho, Q.B.: An approach for automatically structuring vietnamese legal text. In: International Conference on Asian Language Processing (IALP), pp. 187–190 (2014)
Google Scholar
Shehri – Pakistan (2017). http://shehripakistan.com/. Accessed 05 July 2018
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy markov models for information extraction and segmentation. In: ICML, vol. 17, pp. 591–598 (2000)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 282–289 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence and Multidisciplinary Research Lab, University College of Information Technology, University of the Punjab, Lahore, 54000, Pakistan
Shahmin Sharafat, Zara Nasar & Syed Waqar Jaffry

Authors

Shahmin Sharafat
View author publications
You can also search for this author in PubMed Google Scholar
Zara Nasar
View author publications
You can also search for this author in PubMed Google Scholar
Syed Waqar Jaffry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zara Nasar .

Editor information

Editors and Affiliations

Department of Computer Science and IT, Islamia University of Bahawalpur, Baghdad, Pakistan
Imran Sarwar Bajwa
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
Fairouz Kamareddine
Department of Computer Engineering and Digital Systems, University of Sao Paulo, São Paulo, Brazil
Anna Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharafat, S., Nasar, Z., Jaffry, S.W. (2019). Legal Data Mining from Civil Judgments. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_37

Download citation

DOI: https://doi.org/10.1007/978-981-13-6052-7_37
Published: 12 March 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6051-0
Online ISBN: 978-981-13-6052-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics