A Document Model Based on Relevance Modeling Techniques for Semi-structured Information Warehouses

  • Juan Manuel Pérez
  • Rafael Berlanga
  • María José Aramburu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3180)


During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR – OLAP query. Preliminary evaluations show the usefulness of the document model.


Digital Library Document Model News Item Path Expression Relevance Ranking 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kimball, R.: The Data Warehouse toolkit. John Wiley & Sons, Chichester (2002)Google Scholar
  2. 2.
    Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP to user-analysts: An IT mandate. Technical Report, E.F. Codd & Associates (1993)Google Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  4. 4.
    World Wide Web Consortium,
  5. 5.
    Xyleme, L.: A dynamic warehouse for XML data of the Web. IEEE Data Engineering Bulletin 24(2), 40–47 (2001)Google Scholar
  6. 6.
    Pedersen, D., Riis, K., Pedersen, T.B.: XML-Extended OLAP Querying. In: Proc of the 14th International Conference on Scientific and Statistical Database Management, July 24-26, pp. 195–206 (2002)Google Scholar
  7. 7.
    Navarro, G., Baeza-Yates, R.: Proximal Nodes: A Model to Query Document Databases by Contents and Structure. ACM Trans. on Information Systems (1997)Google Scholar
  8. 8.
    Aramburu, M.J., Berlanga, R.: A Temporal Object-Oriented Model for Digital Librares of Documents. Concurrency: Practice and Experience 13(11), John Wiley (2001)Google Scholar
  9. 9.
    Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proc. of ACM SIGIR 1998 conference, pp. 275–281 (1998)Google Scholar
  10. 10.
    Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proc. of ACM SIGIR 1998 conference, pp. 267–275 (2001)Google Scholar
  11. 11.
    Llidó, D.M., Berlanga, R., Aramburu, M.J.: Extracting Temporal References to Assign Document Event-Time Periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 62–71. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Pons, A., Berlanga, R., Ruíz-Shulcloper, J.: Building a Hierarchy of Events and Topics for Newspaper Digital Libraries. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 588–596. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: Supporting Imprecision in Multidimensional Databases Using Granularities. In: Proc. of the Eleventh International Conference on Scientific and Statistical Database Management, pp. 90–101 (1999)Google Scholar
  14. 14.
    Rundensteiner, E., Bic, L.: Evaluating Aggregates in Possibilistic Relational Databases. DKE 7(3), 239–267 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Juan Manuel Pérez
    • 1
  • Rafael Berlanga
    • 1
  • María José Aramburu
    • 1
  1. 1.Universitat Jaume ICastellónSpain

Personalised recommendations