Skip to main content

Towards Annotation of Legal Documents with Ontology Concepts

  • Conference paper
  • First Online:
AI Approaches to the Complexity of Legal Systems (AICOL 2015, AICOL 2016, AICOL 2016, AICOL 2017, AICOL 2017)

Abstract

This paper describes a task of semantic labeling of document segments. The idea exploits ontology in providing a fine-grained conceptual document annotation. We describe a way of dividing a document into its constituent semantically-coherent blocks. These blocks are then used to perform conceptual tagging for efficient passage information retrieval. The proposed task interfaces other application areas such as intra-mapping of ontologies, text summarization and information extraction. The system has been evaluated on a task of conceptual tagging of documents and achieved a promising result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://eur-lex.europa.eu/homepage.html.

  2. 2.

    E.g homonyms and synonyms.

  3. 3.

    http://eurovoc.europa.eu/.

  4. 4.

    An online database of EU government documents.

  5. 5.

    We used WordNet concept distance to compute semantic similarity between adjacent sentences of a candidate segment according to a chosen window.

  6. 6.

    E.g., synonymy and polysemy.

  7. 7.

    Where k is a chosen number and set as 3.

  8. 8.

    English wikipedia dump was downloaded on July 30, 2015.

  9. 9.

    https://code.google.com/p/word2vec/.

  10. 10.

    https://github.com/RaRe-Technologies/gensim. Training Parameters: Context Window: 5, Neural Network layer size: 200, Minimum word count: 5.

  11. 11.

    Ensures the highest similarity is at least z percent higher than the next highest value. By default, z = 0.3.

References

  1. Adebayo, K., Di Caro, L., Boella, G.: NORMAS at SemEval-2016 task 1: SEMSIM: a multi-feature approach to semantic text similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, 16–17 June 2016, pp. 718–725 (2016)

    Google Scholar 

  2. Adebayo, K.J., Di Caro, L., Boella, G.: Text segmentation with topic modeling and entity coherence. In: Abraham, A., Haqiq, A., Alimi, A.M., Mezzour, G., Rokbani, N., Muda, A.K. (eds.) HIS 2016. AISC, vol. 552, pp. 175–185. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52941-7_18

    Chapter  Google Scholar 

  3. Adebayo, K.J., Di Caro, L., Boella, G., Bartolini, C.: An approach to information retrieval and question answering in the legal domain, pp. 15–25 (2016)

    Google Scholar 

  4. Ai, Q., Yang, L., Guo, J., Croft, W.B.: Improving language estimation with the paragraph vector model for ad-hoc retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 869–872. ACM (2016)

    Google Scholar 

  5. Bikakis, N., Giannopoulos, G., Dalamagas, T., Sellis, T.: Integrating keywords and semantics on document annotation and search. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6427, pp. 921–938. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16949-6_19

    Chapter  Google Scholar 

  6. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)

    Google Scholar 

  7. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 26–33. Association for Computational Linguistics (2000)

    Google Scholar 

  8. Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pp. 100–109 (2013)

    Google Scholar 

  9. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A framework and graphical development environment for robust NLP tools and applications. In: ACL, pp. 168–175 (2002)

    Google Scholar 

  10. Dill, S., et al.: A case for automated large-scale semantic annotation. Web Semant.: Sci. Serv. Agents World Wide Web 1(1), 115–132 (2003)

    Article  Google Scholar 

  11. Halliday, M.A.K., Hasan, R.: Cohesion in English (1976)

    Google Scholar 

  12. Handschuh, S., Staab, S.: Authoring and annotation of web pages in cream. In: Proceedings of the 11th International Conference on World Wide Web, pp. 462–473. ACM (2002)

    Google Scholar 

  13. Hearst, M.A.: TextTiling: a quantitative approach to discourse segmentation. Technical report, Citeseer (1993)

    Google Scholar 

  14. Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Computational linguistics 23(1), 33–64 (1997)

    Google Scholar 

  15. Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J.: Text mining through semi automatic semantic annotation. In: Reimer, U., Karagiannis, D. (eds.) PAKM 2006. LNCS (LNAI), vol. 4333, pp. 143–154. Springer, Heidelberg (2006). https://doi.org/10.1007/11944935_13

    Chapter  Google Scholar 

  16. Laclavík, M., Ciglan, M., Seleng, M., Krajei, S.: Ontea: semi-automatic pattern based text annotation empowered with information retrieval methods. In: Tools for Acquisition, Organisation and Presenting of Information and Knowledge: Proceedings in Informatics and Information Technologies, Kosice, Vydavatelstvo STU, Bratislava, part, vol. 2, pp. 119–129 (2007)

    Google Scholar 

  17. Laclavik, M., Seleng, M., Gatial, E., Balogh, Z., Hluchy, L.: Ontology based text annotation-OnTeA, pp. 280–284 (2006)

    Google Scholar 

  18. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014)

  19. Li, Y., McLean, D., Bandar, Z.A., O’shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  20. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  21. Loza Mencía, E., Fürnkranz, J.: Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 192–215. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_11

    Chapter  Google Scholar 

  22. Mitra, B., Diaz, F., Craswell, N.: Learning to match using local and distributed representations of text for web search. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1291–1299. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  23. Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM – semantic annotation platform. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39718-2_53

    Chapter  Google Scholar 

  24. Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: ten Teije, A., et al. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 114–129. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33876-2_12

    Chapter  Google Scholar 

  25. Riedl, M., Biemann, C.: Text segmentation with topic models. J. Lang. Technol. Comput. Linguist. 27(1), 47–69 (2012)

    Google Scholar 

  26. Turney, P.D., Pantel, P., et al.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)

    Article  MathSciNet  Google Scholar 

  27. Usbeck, R., et al.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1133–1143. ACM (2015)

    Google Scholar 

Download references

Acknowledgements

Kolawole J. Adebayo has received funding from the Erasmus Mundus Joint International Doctoral (Ph.D.) programme in Law, Science and Technology. Luigi Di Caro and Guido Boella have received funding from the European Union’s H2020 research and innovation programme under the grant agreement No 690974 for the project “MIREL: MIning and REasoning with Legal texts”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kolawole John Adebayo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adebayo, K.J., Di Caro, L., Boella, G. (2018). Towards Annotation of Legal Documents with Ontology Concepts. In: Pagallo, U., Palmirani, M., Casanovas, P., Sartor, G., Villata, S. (eds) AI Approaches to the Complexity of Legal Systems. AICOL AICOL AICOL AICOL AICOL 2015 2016 2016 2017 2017. Lecture Notes in Computer Science(), vol 10791. Springer, Cham. https://doi.org/10.1007/978-3-030-00178-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00178-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00177-3

  • Online ISBN: 978-3-030-00178-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics