From Thesaurus Towards Ontologies in Large Legal Databases

  • Ángel Sancho FerrerEmail author
  • Carlos Fernández Hernández
  • José Manuel Mateo Rivero
Part of the Law, Governance and Technology Series book series (LGTS, volume 1)


We are in the middle of an historical paradigm shift. It is a change similar in scale to those confronting the Library of Alexandria, twenty-two centuries ago. Metadata, indexes and taxonomies were the paradigm during the age of paper and print, and librarians and publishers leveraged them for searching. Now the amount of documents has grown to levels that make those traditional tools less efficient for users and less affordable for publishers. But, in the last three decades, search technologies have created new solutions such as direct queries, relevance ranking or faceted results, as well as the promises of conceptual search engines and ontologies. However, this integration of legal knowledge has not yet proven scalable in large databases: the improvements in recall have a negative effect on precision and performance. We have focused in one key behavior of legal experts in legal searches: the creation of “better queries” as a result of knowledge of the domain and search techniques. This is the same that happens on taxonomical classical searches, but in full-text we could try to encode part of that knowledge in a search engine. To achieve this goal, we have developed both the technology to semantically analyze documents and queries, and a methodology to fill a dictionary with 10,000 concepts and 40,000 expressions. This has been put in production with a 3 million legal documents database. In addition to the semantic improvements, these developments have created significant improvements in the relevance algorithm and complementary tools such as dynamic summaries and query reformulation trough local context analysis.


Search Engine Query Term Complex Query Stop Word Result List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank John Barker, Director of Strategic Product Design in Wolters Kluwer’s Global Platforms Organization, and Rosalina Diaz Valcárcel, Chief Execute Officer from Wolters Kluwer Spain, for their intellectual and professional support. We also want to underline the fact that most of these ideas were originated with Angel Bizcarrondo Ibáñez, from the Centro de Estudios Garrigues. Finally, we would like to acknowledge the interchange of ideas with Luis Pezzi, Manuel Cuadrado, Rene van Erk and Guy van Peel. This project has been funded by the Ministerio de Industria, Turismo y Comercio de España under the programs Profit (FIT-350100-2007-161) and Avanza I+D (TSI-020501-2008-80).


  1. Brockman, J. (Ed.) (2002). The Next Fifty Years: Science in the First Half of the Twenty-first Century. Vintage Books, New York, NY.Google Scholar
  2. Casellas, N. (2008). Modelling Legal Knowledge Through Ontologies. OPJK: The Ontology of Professional Judicial Knowledge. Ph.D. Thesis, Universitat Autònoma de Barcelona, Spain.Google Scholar
  3. Elias, S., S. Levinkind (2005). Legal Research. How to Find & Understand the Law. 13th ed., Nolo Press, Berkeley, CA.Google Scholar
  4. Fellbaum, C. (Ed.) (1998). WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA.Google Scholar
  5. Foskett, D.J. (1997). Thesaurus. In Readings in Information Retrieval. Morgan Kaufmann Publishers, Cambridge, MA.Google Scholar
  6. Gospodnetic, O., E. Hatcher (2005). Lucene in Action. Manning Publications, Greenwich.Google Scholar
  7. Gruber, T.R. (1993). A Translation Approach to Portable Ontology Specifications. Knowledge Acquisitions, 5(2): 199–221.CrossRefGoogle Scholar
  8. Hafner, C.D. (1980). Representation of Knowledge in a Legal Information Retrieval System. In Proceedings of the 3rd annual ACM conference on Research and development in information retrieval, 139–153.Google Scholar
  9. Liebwald, D. (2007). Semantic Spaces and Multilingualism in the Law: The Challenge of Legal Knowledge Management. In P. Casanovas, M.A. Biasiotti, E.F.M.T. Sagri (Eds.) Proceedings of the Workshop on Legal Ontologies and Artificial Intelligence Techniques, LOAIT-2007, at the International Conference on AI and Law (ICAIL’07) Stanford, 131–146.Google Scholar
  10. Mandala, R., T. Takenobu, T. Hozumi (1998). The Use of WordNet in Information Retrieval. Coling/ACL Workshop, Montreal.Google Scholar
  11. Manning, C.D., P. Raghavan, H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, Cambridge, MA.Google Scholar
  12. Sancho-Ferrer, A., J.M. Mateo-Rivero, A. Mesas-García (2008) Improvements in Recall and Precision in Wolters Kluwer Spain Legal Search Engine. In P. Casanovas et al. (Eds.) Computable Models of the Law. Lanuages, Dialogues, Games, Ontologies. LNAI 4884. Springer, Heidelberg, 130–145.Google Scholar
  13. Smith, B. (2003). Ontology. In L. Floridi (Ed.) Blackwell Guide to the Philosophy of Computing and Information. Blackwell, Oxford, MA, 155–166.Google Scholar
  14. Susskind, R. (2000). Transforming the Law: Essays on Technology, Justice and the Legal Marketplace. Oxford University Press, Oxford, MA.Google Scholar
  15. Voorhees, E.M., D.K. Harman (2005). TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge, MA.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Ángel Sancho Ferrer
    • 1
    Email author
  • Carlos Fernández Hernández
    • 1
  • José Manuel Mateo Rivero
    • 1
  1. 1.Research and Development DepartmentWolters Kluwer SpainMadridSpain

Personalised recommendations