Advertisement

Annotated Corpus of Czech Case Law for Reference Recognition Tasks

  • Jakub Harašta
  • Jaromír Šavelka
  • František Kasl
  • Adéla Kotková
  • Pavel Loutocký
  • Jakub Míšek
  • Daniela Procházková
  • Helena Pullmannová
  • Petr Semenišín
  • Tamara Šejnová
  • Nikola Šimková
  • Michal Vosinek
  • Lucie Zavadilová
  • Jan Zibner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

We describe an annotated corpus of 350 decisions of Czech top-tier courts which was gathered for a project assessing the relevance of court decisions in Czech law. We describe two layers of processing of the corpus; every decision was annotated by two trained annotators and then manually adjudicated by one trained curator to solve possible disagreements between annotators. This corpus was developed as training and testing material for reference recognition tasks which will be further used for research on assessment of legal importance. However, the overall shortage of available research corpora of annotated legal texts, particularly in Czech language, leads us to believe that other research teams may find it useful.

Keywords

Reference recognition Dataset Legal texts Manual annotation 

Notes

Contribution Statement and Acknowledgment

J.H. developed the annotation scheme, prepared the annotation manual, and selected the court decisions included in the dataset. J.H., T.Š., N.Š., and J.Z. participated in dummy runs and evaluation of annotation manual. J.H., F.K., A.K., P.L., J.M., D.P., H.P., P.S., T.Š., N.Š., M.V., L.Z., and J.Z. annotated the decisions. J.H., P.L., and J.M. curated/edited the decisions. J.Š. programmed the annotation environment, prepared dataset for publication, and prepared dataset statistics. J.H., J.Š., F.K. wrote the paper with input from all authors.

J.H., F.K., A.K., P.L., J.M., D.P., H.P., P.S., T.Š., M.V., L.Z., and J.Z. gratefully acknowledge the support from the Czech Science Foundation under grant no. GA17-20645S.

References

  1. 1.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of Robust HLT applications. In: Proceedings of the 40th Annual ACL meeting, pp. 168–175 (2002)Google Scholar
  2. 2.
    Davies, M.: Corpus of US Supreme Court Opinions. https://corpus.byu.edu/scotus/
  3. 3.
    Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., Wudali, R.: Named entity recognition and resolution in legal text. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-12837-0_2CrossRefGoogle Scholar
  4. 4.
    Grover, C., Hachey, B., Hughson, I.: The HOLJ corpus: supporting summarisation of legal texts. In: Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora, pp. 47–53 (2004)Google Scholar
  5. 5.
    Harašta, J., Šavelka, J.: Toward linking heterogenous references in Czech court decisions to content. In: Proceedings of JURIX, pp. 177–182 (2017)Google Scholar
  6. 6.
    Hamann, H., Vogel, F., Gauer, I.: Computer assisted legal linguistics (CAL\(^{2}\)). In: Proceedings of JURIX, pp. 195–198 (2016)Google Scholar
  7. 7.
    Höfler, S., Piotrowski, M.: Building corpora for the philological study of Swiss legal texts. J. Lang. Technol. Comput. Linguist. 26(2), 77–89 (2011)Google Scholar
  8. 8.
    Kríž, V., Hladká, B., Dědek, J., Nečaský, M.: Statistical recognition of references in Czech court decisions. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds.) MICAI 2014. LNCS (LNAI), vol. 8856, pp. 51–61. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-13647-9_6CrossRefGoogle Scholar
  9. 9.
    Landthaler, J., Waltl, B., Matthes, F.: Unveiling references in legal texts: implicit versus explicit network structures. In: Proceedings of IRIS, pp. 71–78 (2016)Google Scholar
  10. 10.
    Liu, J.S., Chen, H.-H., Ho, M.H.-C., Li, Y.-C.: Citations with different levels of relevancy: tracing the main paths of legal opinions. J. Assoc. Inf. Sci. Technol. 65(12), 2479–2488 (2014)CrossRefGoogle Scholar
  11. 11.
    de Maat, E., Winkels, R., Van Engers, T.: Automated detection of reference structures in law. In: Proceedings of JURIX, pp. 41–50 (2006)Google Scholar
  12. 12.
    van Opijnen, M.: Canonicalizing complex case law citations. In: Proceedings of JURIX, pp. 97–106 (2010)Google Scholar
  13. 13.
    Palmirani, M., Brighi, R., Massini, M.: Automated extraction of normative references in legal texts. In: Proceedings of ICAIL, pp. 105–106 (2003)Google Scholar
  14. 14.
    Peréz, J.M., Rizzo, C.R.: Structure and design of the british law report corpus (BLRC): a legal corpus of judicial decisions from the UK. J. Engl. Stud. 10, 131–145 (2012)CrossRefGoogle Scholar
  15. 15.
    Panagis, Y., Šadl, U.: The force of EU case law: a multidimensional study of case citations. In: Proceedings of JURIX, pp. 71–80 (2015)Google Scholar
  16. 16.
    Automated System and Method for Generating Reasons that a Court Case is Cited. Patent US6856988Google Scholar
  17. 17.
    Pontrandolfo, G.: Investigating judicial phraseology with COSPE: a contrastive corpus-based study. In: Fantinuoli, C., Zanettin, F. (eds.) New Directions in Corpus-Based Translation Studies, pp. 137–159 (2015)Google Scholar
  18. 18.
    Rodríguez-Puente, P.: Introducing the corpus of historical english law reports: structure and compilation techniques. Revistas de Lenguas para Fines Específicos 17, 99–120 (2011)Google Scholar
  19. 19.
    Steinberger, R. et al.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: Proceedings of LREC, pp. 2142–2147 (2006)Google Scholar
  20. 20.
    Vogel, F., Hamann, H., Gauer, I.: Computer-assisted legal linguistics: corpus analysis as a new tool for legal studies. In: Law & Social Inquiry, Early View (2017)Google Scholar
  21. 21.
    Walker, V.R.: The need for annotated corpora from legal documents, and for (Human) protocols for creating them: the attribution problem. In: Cabrio, E., Graeme, H., Villata, S., Wyner, A. (eds.) Natural Language Argumentation: Mining, Processing, and Reasoning over Textual Arguments (Dagstuhl Seminar 16161) (2016)Google Scholar
  22. 22.
    Wyner, A.Z., Peters, W., Katz, D.: A case study on legal case annotations. In: Proceedings of JURIX, pp. 165–174 (2013)Google Scholar
  23. 23.
    Wyner, A.: Towards annotating and extracting textual legal case elements. Informatica e diritto XIX(1–2), 173–183 (2010)Google Scholar
  24. 24.
    Zhang, P., Koppaka, L.: Semantics-based legal citation network. In: Proceedings of ICAIL, pp. 123–130 (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jakub Harašta
    • 1
  • Jaromír Šavelka
    • 2
  • František Kasl
    • 1
  • Adéla Kotková
    • 1
  • Pavel Loutocký
    • 1
  • Jakub Míšek
    • 1
  • Daniela Procházková
    • 1
  • Helena Pullmannová
    • 1
  • Petr Semenišín
    • 1
  • Tamara Šejnová
    • 1
  • Nikola Šimková
    • 1
    • 3
  • Michal Vosinek
    • 1
  • Lucie Zavadilová
    • 1
  • Jan Zibner
    • 1
  1. 1.Faculty of LawMasaryk UniversityBrnoCzech Republic
  2. 2.Intelligent Systems ProgramUniversity of PittsburghPittsburghUSA
  3. 3.Faculty of InformaticsMasaryk UniversityBrnoCzech Republic

Personalised recommendations