Towards Annotation of Legal Documents with Ontology Concepts

Adebayo, Kolawole John; Di Caro, Luigi; Boella, Guido

doi:10.1007/978-3-030-00178-0_23

Kolawole John Adebayo¹⁸,
Luigi Di Caro¹⁸ &
Guido Boella¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10791))

Included in the following conference series:

1321 Accesses

Abstract

This paper describes a task of semantic labeling of document segments. The idea exploits ontology in providing a fine-grained conceptual document annotation. We describe a way of dividing a document into its constituent semantically-coherent blocks. These blocks are then used to perform conceptual tagging for efficient passage information retrieval. The proposed task interfaces other application areas such as intra-mapping of ontologies, text summarization and information extraction. The system has been evaluated on a task of conceptual tagging of documents and achieved a promising result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://eur-lex.europa.eu/homepage.html.
2.
E.g homonyms and synonyms.
3.
http://eurovoc.europa.eu/.
4.
An online database of EU government documents.
5.
We used WordNet concept distance to compute semantic similarity between adjacent sentences of a candidate segment according to a chosen window.
6.
E.g., synonymy and polysemy.
7.
Where k is a chosen number and set as 3.
8.
English wikipedia dump was downloaded on July 30, 2015.
9.
https://code.google.com/p/word2vec/.
10.
https://github.com/RaRe-Technologies/gensim. Training Parameters: Context Window: 5, Neural Network layer size: 200, Minimum word count: 5.
11.
Ensures the highest similarity is at least z percent higher than the next highest value. By default, z = 0.3.

References

Adebayo, K., Di Caro, L., Boella, G.: NORMAS at SemEval-2016 task 1: SEMSIM: a multi-feature approach to semantic text similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, 16–17 June 2016, pp. 718–725 (2016)
Google Scholar
Adebayo, K.J., Di Caro, L., Boella, G.: Text segmentation with topic modeling and entity coherence. In: Abraham, A., Haqiq, A., Alimi, A.M., Mezzour, G., Rokbani, N., Muda, A.K. (eds.) HIS 2016. AISC, vol. 552, pp. 175–185. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52941-7_18
Chapter Google Scholar
Adebayo, K.J., Di Caro, L., Boella, G., Bartolini, C.: An approach to information retrieval and question answering in the legal domain, pp. 15–25 (2016)
Google Scholar
Ai, Q., Yang, L., Guo, J., Croft, W.B.: Improving language estimation with the paragraph vector model for ad-hoc retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 869–872. ACM (2016)
Google Scholar
Bikakis, N., Giannopoulos, G., Dalamagas, T., Sellis, T.: Integrating keywords and semantics on document annotation and search. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6427, pp. 921–938. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16949-6_19
Chapter Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
Google Scholar
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 26–33. Association for Computational Linguistics (2000)
Google Scholar
Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pp. 100–109 (2013)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust NLP tools and applications. In: ACL, pp. 168–175 (2002)
Google Scholar
Dill, S., et al.: A case for automated large-scale semantic annotation. Web Semant.: Sci. Serv. Agents World Wide Web 1(1), 115–132 (2003)
Article Google Scholar
Halliday, M.A.K., Hasan, R.: Cohesion in English (1976)
Google Scholar
Handschuh, S., Staab, S.: Authoring and annotation of web pages in cream. In: Proceedings of the 11th International Conference on World Wide Web, pp. 462–473. ACM (2002)
Google Scholar
Hearst, M.A.: TextTiling: a quantitative approach to discourse segmentation. Technical report, Citeseer (1993)
Google Scholar
Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Computational linguistics 23(1), 33–64 (1997)
Google Scholar
Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Text mining through semi automatic semantic annotation. In: Reimer, U., Karagiannis, D. (eds.) PAKM 2006. LNCS (LNAI), vol. 4333, pp. 143–154. Springer, Heidelberg (2006). https://doi.org/10.1007/11944935_13
Chapter Google Scholar
Laclavík, M., Ciglan, M., Seleng, M., Krajei, S.: Ontea: semi-automatic pattern based text annotation empowered with information retrieval methods. In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge: Proceedings in Informatics and Information Technologies, Kosice, Vydavatelstvo STU, Bratislava, part, vol. 2, pp. 119–129 (2007)
Google Scholar
Laclavik, M., Seleng, M., Gatial, E., Balogh, Z., Hluchy, L.: Ontology based text annotation-OnTeA, pp. 280–284 (2006)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014)
Li, Y., McLean, D., Bandar, Z.A., O’shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Article Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)
Google Scholar
Loza Mencía, E., Fürnkranz, J.: Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 192–215. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_11
Chapter Google Scholar
Mitra, B., Diaz, F., Craswell, N.: Learning to match using local and distributed representations of text for web search. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1291–1299. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM – semantic annotation platform. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39718-2_53
Chapter Google Scholar
Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: ten Teije, A., et al. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 114–129. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33876-2_12
Chapter Google Scholar
Riedl, M., Biemann, C.: Text segmentation with topic models. J. Lang. Technol. Comput. Linguist. 27(1), 47–69 (2012)
Google Scholar
Turney, P.D., Pantel, P., et al.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
Article MathSciNet Google Scholar
Usbeck, R., et al.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1133–1143. ACM (2015)
Google Scholar

Download references

Acknowledgements

Kolawole J. Adebayo has received funding from the Erasmus Mundus Joint International Doctoral (Ph.D.) programme in Law, Science and Technology. Luigi Di Caro and Guido Boella have received funding from the European Union’s H2020 research and innovation programme under the grant agreement No 690974 for the project “MIREL: MIning and REasoning with Legal texts”.

Author information

Authors and Affiliations

Dipartimento di Informatica, Universita Di Torino, Corso Svizzera 185, 10149, Turin, Italy
Kolawole John Adebayo, Luigi Di Caro & Guido Boella

Authors

Kolawole John Adebayo
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Caro
View author publications
You can also search for this author in PubMed Google Scholar
Guido Boella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kolawole John Adebayo .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Ugo Pagallo
University of Bologna, Bologna, Italy
Monica Palmirani
La Trobe University, Melbourne, VIC, Australia
Pompeu Casanovas
University of Bologna, Bologna, Italy
Giovanni Sartor
Inria - Sophia Antipolis-Méditerranée, Sophia Antipolis, France
Serena Villata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adebayo, K.J., Di Caro, L., Boella, G. (2018). Towards Annotation of Legal Documents with Ontology Concepts. In: Pagallo, U., Palmirani, M., Casanovas, P., Sartor, G., Villata, S. (eds) AI Approaches to the Complexity of Legal Systems. AICOL AICOL AICOL AICOL AICOL 2015 2016 2016 2017 2017. Lecture Notes in Computer Science(), vol 10791. Springer, Cham. https://doi.org/10.1007/978-3-030-00178-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-00178-0_23
Published: 23 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00177-3
Online ISBN: 978-3-030-00178-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics