Abstract
Due to advent of computing, content digitization and its processing is being widely performed across the globe. Legal domain is amongst many of those areas that provide various opportunities for innovation and betterment by means of computational advancements. In Pakistan, since last couple of years, courts have been reporting judgments for public consumption. This reported data is of great importance for judges, lawyers and civilians in various aspects. As this data is growing at rapid rate, there is dire need to process this huge amount of data to better address the need of respective stakeholders. Therefore, in this study, our aim is to develop a machine learning system that can automatically extract information out of public reported judgments of Lahore High Court. This information, once extracted, can be utilized in betterment for society and policy making in Pakistan. This study takes the first step to achieve this goal by means of extracting various entities from legal judgments. Total ten entities are being extracted that include dates, case numbers, reference cases, person names, respondent names etc. In order to automatically extract these entities, primary requirement was to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for entity extraction. Finally, various algorithms including Markov models and Conditional Random Fields are applied on annotated dataset. Experiments show that these approaches achieve reasonable well results for legal data extraction. Primary contribution of this study is development of annotated dataset on civil judgments followed by training of various machine learning models to extract the potential information from a judgment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Farzindar, A., Lapalme, G.: Legal text summarization by exploration of the thematic structure and argumentative roles. In: Text summarization branches out workshop held in conjunction with ACL, pp. 27–34 (2004)
Grover, C., Hachey, B., Korycinski, C.: Summarising legal texts: sentential tense and argumentative roles. In: Proceedings of the HLT-NAACL 2003 on Text Summarization Workshop, Stroudsburg, PA, USA, vol. 5, pp. 33–40 (2003)
Raghav, K., Balakrishna Reddy, P., Balakista Reddy, V., Krishna Reddy, P.: Text and citations based cluster analysis of legal judgments. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 449–459. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_42
Chou, S., Hsing, T.-P.: Text mining technique for chinese written judgment of criminal case. In: Chen, H., Chau, M., Li, S., Urs, S., Srinivasa, S., Wang, G.A. (eds.) PAISI 2010. LNCS, vol. 6122, pp. 113–125. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13601-6_14
Gonçalves, T., Quaresma, P.: A preliminary approach to the multilabel classification problem of portuguese juridical documents. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 435–444. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_50
Opsomer, R., De Meyer, G., Cornelis, C., Van Eetvelde, G.: Exploiting properties of legislative texts to improve classification accuracy. In: Proceedings of the 2009 Conference on Legal Knowledge and Information Systems, JURIX 2009: The Twenty-Second Annual Conference, Amsterdam, The Netherlands, pp. 136–145 (2009)
Galgani, F., Compton, P., Hoffmann, A.: Combining different summarization techniques for legal text. In: Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pp. 115–123 (2012)
Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., Wudali, R.: Named entity recognition and resolution in legal text. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_2
Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., Montemagni, S.: NLP-based metadata extraction for legal text consolidation. In: Proceedings of the 12th International Conference on Artificial Intelligence and Law, New York, NY, USA, pp. 40–49 (2009)
Palmirani, M., Brighi, R., Massini, M.: Automated extraction of normative references in legal texts. In: Proceedings of the 9th International Conference on Artificial Intelligence and Law, pp. 105–106 (2003)
Poudyal, P., Borrego, L., Quaresma, P.: Using machine learning algorithms to identify named entities in legal documents: a preliminary approach. Esc. Ciênc. E Tecnol. Universidade Évora (2011)
Jungiewicz, M., Łopuszyński, M.: Unsupervised keyword extraction from polish legal texts. In: Przepiórkowski, A., Ogrodniczuk, M. (eds.) NLP 2014. LNCS (LNAI), vol. 8686, pp. 65–70. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10888-9_7
Bruckschen, M., et al.: Named entity recognition in the legal domain for ontology population. In: Workshop Programme, p. 16 (2010)
Boella, G., Di Caro, L., Robaldo, L.: Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: Morgenstern, L., Stefaneas, P., Lévy, F., Wyner, A., Paschke, A. (eds.) RuleML 2013. LNCS, vol. 8035, pp. 218–225. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39617-5_20
Bui, T.D., Ho, Q.B.: An approach for automatically structuring vietnamese legal text. In: International Conference on Asian Language Processing (IALP), pp. 187–190 (2014)
Shehri – Pakistan (2017). http://shehripakistan.com/. Accessed 05 July 2018
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231 (2000)
McCallum, A., Freitag, D., Pereira, F.C.: Maximum entropy markov models for information extraction and segmentation. In: ICML, vol. 17, pp. 591–598 (2000)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 282–289 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sharafat, S., Nasar, Z., Jaffry, S.W. (2019). Legal Data Mining from Civil Judgments. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_37
Download citation
DOI: https://doi.org/10.1007/978-981-13-6052-7_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6051-0
Online ISBN: 978-981-13-6052-7
eBook Packages: Computer ScienceComputer Science (R0)