Abstract
For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new aggregation function that aggregates textual data in an OLAP environment. The Top_Keyword function (Top_Kw for short) represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)
Golfarelli, M., Maio, D., Rizzi, S.: The Dimensional Fact Model: A Conceptual Model for Data Warehouses. IJCIS 7(2-3), 215–247 (1998)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In: ICDE, pp. 152–159 (1996)
Horner, J., Song, I.-Y., Chen, P.P.: An analysis of additivity in OLAP systems. In: DOLAP, pp. 83–91 (2004)
Keith, S., Kaser, O., Lemire, D.: Analyzing Large Collections of Electronic Text Using OLAP. In: APICS, Conf. in Mathematics, Statistics and Computer Science, pp. 17–26 (2005)
Kimball, R.: The data warehouse toolkit, 1996, 2nd edn. John Wiley and Sons, Chichester (2003)
Lenz, H.J., Thalheim, B.: OLAP Databases and Aggregation Functions. In: SSDBM 2001, pp. 91–100 (2001)
McCabe, C., Lee, J., Chowdhury, A., Grossman, D.A., Frieder, O.: On the design and evaluation of a multi-dimensional approach to information retrieval. In: SIGIR, pp. 363–365 (2000)
Mothe, J., Chrisment, C., Dousset, B., Alau, J.: DocCube: Multi-dimensional visualisation and exploration of large document sets. JASIST 54(7), 650–659 (2003)
Nassis, V., Rajugan, R., Dillon, T.S., Wenny, R.J.: Conceptual Design of XML Document Warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 1–14. Springer, Heidelberg (2004)
Park, B.K., Han, H., Song, I.Y.: XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 32–42. Springer, Heidelberg (2005)
Ravat, F., Teste, O., Tournier, R.: OLAP Aggregation Function for Textual Data Warehouse. In: ICEIS 2007, vol. DISI, pp. 151–156 (2007)
Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Algebraic and graphic languages for OLAP manipulations. ijDWM 4(1), 17–46 (2007)
Robertson, S.: Understainding Inverse Document Frequency: On theoretical arguments for IDF. Journal of Documentation 60(5), 503–520 (2004)
Sullivan, D.: Document Warehousing and Text Mining. Wiley John & Sons, Chichester (2001)
Torlone, R.: Conceptual Multidimensional Models. In: Rafanelli, M. (ed.) Multidimensional Databases: Problems and Solutions, ch.3, pp. 69–90. Idea Group Inc. (2003)
Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. J. DSS 42(2), 727–744 (2006)
Wang, H., Li, J., He, Z., Gao, H.: Xaggregation: Flexible Aggregation of XML Data. In: Dong, G., Tang, C.-j., Wang, W. (eds.) WAIM 2003. LNCS, vol. 2762, pp. 104–115. Springer, Heidelberg (2003)
Wang, H., Li, J., He, Z., Gao, H.: OLAP for XML Data. In: CIT, pp. 233–237 (2005)
Wiwatwattana, N., Jagadish, H.V., Lakshmanan, L.V.S., Srivastava, D.: X3: A Cube Operator for XML OLAP. In: ICDE, pp. 916–925 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ravat, F., Teste, O., Tournier, R., Zurfluh, G. (2008). Top_Keyword: An Aggregation Function for Textual Document OLAP. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-85836-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)