Skip to main content

A Hierarchical Label Network for Multi-label EuroVoc Classification of Legislative Contents

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11799))

Abstract

EuroVoc is a thesaurus maintained by the European Union Publication Office, used to describe and index legislative documents. The Eurovoc concepts are organized following a hierarchical structure, with 21 domains, 127 micro-thesauri terms, and more than 6,700 detailed descriptors. The large number of concepts in the EuroVoc thesaurus makes the manual classification of legal documents highly costly. In order to facilitate this classification work, we present two main contributions. The first one is the development of a hierarchical deep learning model to address the classification of legal documents according to the EuroVoc thesaurus. Instead of training a classifier for each level, our model allows the simultaneous prediction of the three levels of the EuroVoc thesaurus. Our second contribution concerns the proposal of a new legal corpus for evaluating the classification of documents written in Portuguese. Our proposed corpus, named EUR-Lex PT, contains more than 220k documents, labeled under the three EuroVoc hierarchical levels. Comparative experiments with other state-of-the-art models indicate that our approach has competitive results, at the same time offering the ability to interpret predictions through attention weights.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://manikvarma.org/downloads/XC/XMLRepository.html.

  2. 2.

    http://eur-lex.europa.eu/homepage.html.

  3. 3.

    http://data.europa.eu/euodp/data/dataset/eurovoc.

  4. 4.

    http://github.com/xiaohan2012/sleec_python.

  5. 5.

    http://github.com/dcaled/EUR-Lex-PT.

References

  1. Babbar, R., Schölkopf, B.: DiSMEC: distributed sparse machines for extreme multi-label classification. In: Proceedings of the ACM International Conference on Web Search and Data Mining (2017)

    Google Scholar 

  2. Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Proceedings of the Conference on Neural Information Processing Systems (2015)

    Google Scholar 

  3. Boella, G., Di Caro, L., Lesmo, L., Rispoli, D., Robaldo, L.: Multi-label classification of legislative text into EuroVoc. In: Proceedings of the International Conference on Legal Knowledge and Information Systems (2012)

    Google Scholar 

  4. Duarte, F., Martins, B., Pinto, C.S., Silva, M.J.: Deep neural models forICD-10 coding of death certificates and autopsy reports in free-text. J. Biomed. Inform. 80, 64–77 (2018)

    Article  Google Scholar 

  5. Eger, S., Youssef, P., Gurevych, I.: Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks. arXiv preprint arXiv:1901.02671 (2019)

  6. Hall, P.: Theoretical comparison of bootstrap confidence intervals. Ann. Stat. 16, 927–953 (1988)

    Article  MathSciNet  Google Scholar 

  7. Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Silva, J., Aluísio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. In: Proceedings of the Brazilian Symposium in Information and Human Language Technology (2017)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  9. Jain, H., Prabhu, Y., Varma, M.: Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)

    Google Scholar 

  10. Liu, J., Chang, W.C., Wu, Y., Yang, Y.: Deep learning for extreme multi-label text classification. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (2017)

    Google Scholar 

  11. Loza Mencía, E., Fürnkranz, J.: Efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 192–215. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_11

    Chapter  Google Scholar 

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the Conference on Neural Information Processing Systems (2013)

    Google Scholar 

  13. Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification — revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_28

    Chapter  Google Scholar 

  14. Peters, E., et al.: Deep contextualized word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (2018)

    Google Scholar 

  15. Prabhu, Y., Kag, A., Harsola, S., Agrawal, R., Varma, M.: Parabel: partitioned label trees for extreme classification with application to dynamic search advertising. In: Proceedings of the Conference on World Wide Web (2018)

    Google Scholar 

  16. Šaric, F., Bašic, B.D., Moens, M.F., Šnajder, J.: Multi-label classification of croatian legal documents using EuroVoc thesaurus. In: Proceedings of the Workshop on Semantic Processing of Legal Texts (2014)

    Google Scholar 

  17. Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10

    Chapter  Google Scholar 

  18. Steinberger, R., Ebrahim, M., Turchi, M.: JRC EuroVoc Indexer JEX - A freely available multi-label categorisation tool. arXiv preprint arXiv:1309.5223 (2013)

  19. Tagami, Y.: AnnexML: approximate nearest neighbor search for extreme multi-label classification. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)

    Google Scholar 

  20. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)

    Google Scholar 

  21. Yen, I.E.H., Huang, X., Ravikumar, P., Zhong, K., Dhillon, I.S.: PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the International Conference on Machine Learning (2016)

    Google Scholar 

  22. You, R., Dai, S., Zhang, Z., Mamitsuka, H., Zhu, S.: AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. arXiv preprint arXiv:1811.01727 (2018)

Download references

Acknowledgements

This research was partially supported by grant UID/CEC/50021/2019 from Fundação para a Ciência e Tecnologia (FCT). We also gratefully acknowledge the support from NVIDIA Corporation, for the donation of the Titan Xp GPU used in our experiments, and the support from Imprensa Nacional-Casa da Moeda (INCM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danielle Caled .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Caled, D., Won, M., Martins, B., Silva, M.J. (2019). A Hierarchical Label Network for Multi-label EuroVoc Classification of Legislative Contents. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds) Digital Libraries for Open Knowledge. TPDL 2019. Lecture Notes in Computer Science(), vol 11799. Springer, Cham. https://doi.org/10.1007/978-3-030-30760-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30760-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30759-2

  • Online ISBN: 978-3-030-30760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics