Abstract
During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR – OLAP query. Preliminary evaluations show the usefulness of the document model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kimball, R.: The Data Warehouse toolkit. John Wiley & Sons, Chichester (2002)
Codd, E.F., Codd, S.B., Salley, C.T.: Providing OLAP to user-analysts: An IT mandate. Technical Report, E.F. Codd & Associates (1993)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
World Wide Web Consortium, http://www.w3.org
Xyleme, L.: A dynamic warehouse for XML data of the Web. IEEE Data Engineering Bulletin 24(2), 40–47 (2001)
Pedersen, D., Riis, K., Pedersen, T.B.: XML-Extended OLAP Querying. In: Proc of the 14th International Conference on Scientific and Statistical Database Management, July 24-26, pp. 195–206 (2002)
Navarro, G., Baeza-Yates, R.: Proximal Nodes: A Model to Query Document Databases by Contents and Structure. ACM Trans. on Information Systems (1997)
Aramburu, M.J., Berlanga, R.: A Temporal Object-Oriented Model for Digital Librares of Documents. Concurrency: Practice and Experience 13(11), John Wiley (2001)
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: Proc. of ACM SIGIR 1998 conference, pp. 275–281 (1998)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proc. of ACM SIGIR 1998 conference, pp. 267–275 (2001)
Llidó, D.M., Berlanga, R., Aramburu, M.J.: Extracting Temporal References to Assign Document Event-Time Periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 62–71. Springer, Heidelberg (2001)
Pons, A., Berlanga, R., Ruíz-Shulcloper, J.: Building a Hierarchy of Events and Topics for Newspaper Digital Libraries. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 588–596. Springer, Heidelberg (2003)
Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: Supporting Imprecision in Multidimensional Databases Using Granularities. In: Proc. of the Eleventh International Conference on Scientific and Statistical Database Management, pp. 90–101 (1999)
Rundensteiner, E., Bic, L.: Evaluating Aggregates in Possibilistic Relational Databases. DKE 7(3), 239–267 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez, J.M., Berlanga, R., Aramburu, M.J. (2004). A Document Model Based on Relevance Modeling Techniques for Semi-structured Information Warehouses. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-30075-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive