GarNLP: A Natural Language Processing Pipeline for Garnishment Documents

Abstract

Basic elements of the law, such as statuses and regulations, are embodied in natural language, and strictly depend on linguistic expressions. Hence, analyzing legal contents is a challenging task, and the legal domain is increasingly looking for automatic-processing support. This paper focuses on a specific context in the legal domain, which has so far remained unexplored: automatic processing of garnishment documents. A garnishment is a legal procedure by which a creditor can collect what a debtor owes by requiring to confiscate a debtor’s property (e.g., a checking account) that is hold by a third party, dubbed garnishee. Our proposal, motivated by a real-world use case, is a versatile natural-language-processing pipeline to support a garnishee in the processing of a large-scale flow of garnishment documents. In particular, we mainly focus on two tasks: (i) categorize received garnishment notices onto a predefined taxonomy of categories; (ii) perform an information-extraction phase, which consists in automatically identifying from the text various information, such as identity of involved actors, amounts, and dates. The main contribution of this work is to describe challenges, design, implementation, and performance of the core modules and methods behind our solution. Our proposal is a noteworthy example of how data-science techniques can be successfully applied to a novel yet challenging real-world context.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. 1.

    As an example, the legal office of our partner garnishee receives between 1000 and 2000 documents documents/day.

  2. 2.

    The GarNLP framework is currently being productionized by the partner bank.

  3. 3.

    Owed amount and seized amount may differ as a court order may require to seize an amount that is (slightly) more than the owed one (for tax or interestreasons).

  4. 4.

    http://www.arguana.com, https://webis.de/research/arguana-for-the-web.html

  5. 5.

    https://www.iusexplorer.it/

References

  1. Agnoloni, T., Bacci, L., Francesconi, E., Peters, W., Montemagni, S., Venturi, G.: A two-level knowledge approach to support multilingual legislative drafting. In: Proc. Conf. on Law, Ontologies and the Semantic Web (2009).

  2. Ajani, G., Boella, G., Lesmo, L., Martin, M., Mazzei, A., Radicioni, D.P., Rossi, P. (2010). Semantic processing of legal texts. chap. Multilevel Legal Ontologies. Springer.

  3. Allwood, W. (1988). Expert systems in law. A jurisprudential inquiry. By Richard E. Susskind. The Cambridge Law Journal, 47.

  4. Almeida, F., Xexéo, G. (2019). Word embeddings: A survey. CoRR abs/1901.09069.

  5. Ananiadou, S., & Mcnaught, J. (2005). Text mining for biology and biomedicine. Inc: Artech House.

    Google Scholar 

  6. Bartolini, R., Lenci, A., Montemagni, S., Pirrelli, V., Soria, C. (2004). Automatic classification and analysis of provisions in Italian legal texts: A case study. In: R. Meersman, Z. Tari, A. Corsaro (eds.) Proc. OTM Work.

  7. Bird, S., Loper, E. (2004). NLTK: The natural language toolkit. In: ACL Conf. (Poster and Demonstration)

  8. Bonin, F., Dell’Orletta, F., Venturi, G., Montemagni, S. (2010). Singling out legal knowledge from world knowledge: An NLP-based approach. In: LOAIT Work.

  9. Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N. (2013). TwitIE: An open-source information extraction pipeline for microblog text. In: RANLP Conf., pp. 83–90.

  10. Bordino, I., Ferretti, A., Firrincieli, M., Gullo, F., Paris, M., Pascolutti, S., Sabena, G. (2016). Advancing NLP via a distributed-messaging approach. In: IEEE Big Data, pp. 1561–1568.

  11. Bosca, A., Dini, L. (2010). Semantic processing of legal texts. chap. Ontology Based Law Discovery. Springer .

  12. Breuker, J., & Hoekstra, R. (2004). Epistemology and ontology in core ontologies: FOLaw and LRI-Core, two core ontologies for law. Phycologia.

  13. Casellas, N. (2011). Legal ontology engineering: Methodologies, modelling trends, and the ontology of professional judicial knowledge. Springer.

  14. Cimiano, P., Völker, J. (2005). Text2Onto – A framework for ontology learning and data-driven change discovery.

    Google Scholar 

  15. Clarke, J., Srikumar, V., Sammons, M., Roth, D. (2012). An NLP curator (or: How I learned to stop worrying and love NLP pipelines). In: LREC Conf., pp. 3276–3283.

  16. Di Corso, E., Cerquitelli, T., Ventura, F. (2017). Self-tuning techniques for large scale cluster analysis on textual data collections. In: SAC Conf., pp. 771–776.

  17. Di Corso, E., Proto, S., Cerquitelli, T., Chiusano, S. (2019). Towards automated visualisation of scientific literature. In: ADBIS Conf., pp. 28–36.

  18. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9.

  19. Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. Proc. ACL Conf: In.

    Google Scholar 

  20. Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (2010). Semantic processing of legal texts. chap. Integrating a Bottom-Up and Top-Down Methodology for Building Semantic Resources for the Multilingual Legal Domain. Springer.

  21. Khurana, D., Koli, A., Khatter, K., Singh, S. (2017). Natural language processing: State of the art, current trends and challenges. CoRR abs/1708.05148.

  22. Le, Q. V., & Mikolov, T. (2014). Distributed representations of sentences and documents. Proc. ICML Conf: In.

    Google Scholar 

  23. Lenci, A., Montemagni, S., Pirrelli, V., & Venturi, G. (2009). Ontology learning from italian legal texts. Proc. Conf. on Law, Ontologies and the Semantic Web: In.

    Google Scholar 

  24. Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R. (1999). Performance measures for information extraction. In: In Proceedings of DARPA Broadcast News Workshop, pp. 249–252.

  25. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In: ACL Conf. (System Demonstrations, pp. 55–60..

  26. Mazzei, A., Radicioni, D. P., & Brighi, R. (2009). NLP-based extraction of modificatory provisions semantics. In: Proc. ICAIL Conf.

    Google Scholar 

  27. McCarty, L. T. (2007). Deep semantic interpretations of legal texts. ICAIL Conf: In.

    Google Scholar 

  28. McCarty, L.T. (2009). Remarks on legal text processing – Parsing, semantics and information extraction.

    Google Scholar 

  29. Mikolov, T., Le, Q.V., Sutskever, I. (2013a). Exploiting similarities among languages for machine translation.

    Google Scholar 

  30. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. NIPS Conf: In.

    Google Scholar 

  31. Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1), 3–26.

    Article  Google Scholar 

  32. Otter, D.W., Medina, J.R., Kalita, J.K. (2018). A survey of the usages of deep learning in natural language processing. CoRR abs/1807.10854.

  33. Palmero Aprosio, A., & Moretti, G. (2016). Italy goes to Stanford: A collection of CoreNLP modules for Italian. ArXiv.

  34. Proto, S., Di Corso, E., Ventura, F., Cerquitelli, T. (2018). Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation. In: IEEE Big Data Conf., pp. 33–40.

  35. Renganathan, V. (2017). Text mining in biomedical domain with emphasis on document clustering. Healthcare Informatics Research, 23(3), 141–146.

    Article  Google Scholar 

  36. Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., & Montemagni, S. (2009). NLP-based metadata extraction for legal text consolidation. In: ICAIL Conf.

    Google Scholar 

  37. Valente, A., & Breuker, J. (1994). Ontologies, the missing link between legal theory and AI and law. Mathematics of Computation.

  38. Wachsmuth, H. (2015). Text Analysis Pipelines - Towards Ad-hoc Large-Scale Text Mining, Lecture Notes in Computer Science, vol. 9383. Springer.

  39. Wachsmuth, H., Prettenhofer, P., Stein, B. (2010) Efficient statement identification for automatic market forecasting. In: COLING Conf., pp. 1128–1136.

  40. Wachsmuth, H., Stein, B., Engels, G. (2011). Constructing efficient information extraction pipelines. In: CIKM Conf., pp. 2237–2240.

  41. Wachsmuth, H., Trenkmann, M., Stein, B., Engels, G. (2014). Modeling review argumentation for robust sentiment analysis. In: COLING Conf., pp. 553–564 .

  42. Wachsmuth, H., Kiesel, J., Stein, B. (2015). Sentiment flow - A general model of web review argumentation. In: EMNLP Conf., pp. 601–611.

  43. Wachsmuth, H., Potthast, M., Al-Khatib, K., Ajjour, Y., Puschmann, J., Qu, J., Dorsch, J., Morari, V., Bevendorff, J., Stein, B. (2017). Building an argument search engine for the web. In: ArgMining@EMNLP Work., pp. 49–59.

  44. Wiedemann, G., Yimam, S.M., Biemann, C. (2018). A multilingual information extraction pipeline for investigative journalism. In: EMNLP Conf., pp. 78–83.

  45. Wyner, A., & Peters, W. (2010). Lexical semantics and expert legal knowledge towards the identification of legal case factors. Conf. on Legal Knowledge and Information Systems: In.

    Google Scholar 

  46. Xie, T., Enck, W. (2016). Text analytics for security: Tutorial. In: HotSos Conf., pp. 124–125.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Francesco Gullo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Stefano Pascolutti work completed while the author was employed at UniCredit.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bordino, I., Ferretti, A., Gullo, F. et al. GarNLP: A Natural Language Processing Pipeline for Garnishment Documents. Inf Syst Front 23, 101–114 (2021). https://doi.org/10.1007/s10796-020-09997-0

Download citation

Keywords

  • Applied data science
  • Natural language processing
  • Legal documents
  • Garnishment
  • Categorization
  • Information extraction
  • Supervised learning
  • Word embeddings
  • Named entity recognition