Skip to main content

Top_Keyword: An Aggregation Function for Textual Document OLAP

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

Abstract

For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new aggregation function that aggregates textual data in an OLAP environment. The Top_Keyword function (Top_Kw for short) represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)

    Google Scholar 

  2. Golfarelli, M., Maio, D., Rizzi, S.: The Dimensional Fact Model: A Conceptual Model for Data Warehouses. IJCIS 7(2-3), 215–247 (1998)

    Google Scholar 

  3. Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In: ICDE, pp. 152–159 (1996)

    Google Scholar 

  4. Horner, J., Song, I.-Y., Chen, P.P.: An analysis of additivity in OLAP systems. In: DOLAP, pp. 83–91 (2004)

    Google Scholar 

  5. Keith, S., Kaser, O., Lemire, D.: Analyzing Large Collections of Electronic Text Using OLAP. In: APICS, Conf. in Mathematics, Statistics and Computer Science, pp. 17–26 (2005)

    Google Scholar 

  6. Kimball, R.: The data warehouse toolkit, 1996, 2nd edn. John Wiley and Sons, Chichester (2003)

    Google Scholar 

  7. Lenz, H.J., Thalheim, B.: OLAP Databases and Aggregation Functions. In: SSDBM 2001, pp. 91–100 (2001)

    Google Scholar 

  8. McCabe, C., Lee, J., Chowdhury, A., Grossman, D.A., Frieder, O.: On the design and evaluation of a multi-dimensional approach to information retrieval. In: SIGIR, pp. 363–365 (2000)

    Google Scholar 

  9. Mothe, J., Chrisment, C., Dousset, B., Alau, J.: DocCube: Multi-dimensional visualisation and exploration of large document sets. JASIST 54(7), 650–659 (2003)

    Article  Google Scholar 

  10. Nassis, V., Rajugan, R., Dillon, T.S., Wenny, R.J.: Conceptual Design of XML Document Warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 1–14. Springer, Heidelberg (2004)

    Google Scholar 

  11. Park, B.K., Han, H., Song, I.Y.: XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 32–42. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Ravat, F., Teste, O., Tournier, R.: OLAP Aggregation Function for Textual Data Warehouse. In: ICEIS 2007, vol. DISI, pp. 151–156 (2007)

    Google Scholar 

  13. Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Algebraic and graphic languages for OLAP manipulations. ijDWM 4(1), 17–46 (2007)

    Google Scholar 

  14. Robertson, S.: Understainding Inverse Document Frequency: On theoretical arguments for IDF. Journal of Documentation 60(5), 503–520 (2004)

    Article  Google Scholar 

  15. Sullivan, D.: Document Warehousing and Text Mining. Wiley John & Sons, Chichester (2001)

    Google Scholar 

  16. Torlone, R.: Conceptual Multidimensional Models. In: Rafanelli, M. (ed.) Multidimensional Databases: Problems and Solutions, ch.3, pp. 69–90. Idea Group Inc. (2003)

    Google Scholar 

  17. Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. J. DSS 42(2), 727–744 (2006)

    Google Scholar 

  18. Wang, H., Li, J., He, Z., Gao, H.: Xaggregation: Flexible Aggregation of XML Data. In: Dong, G., Tang, C.-j., Wang, W. (eds.) WAIM 2003. LNCS, vol. 2762, pp. 104–115. Springer, Heidelberg (2003)

    Google Scholar 

  19. Wang, H., Li, J., He, Z., Gao, H.: OLAP for XML Data. In: CIT, pp. 233–237 (2005)

    Google Scholar 

  20. Wiwatwattana, N., Jagadish, H.V., Lakshmanan, L.V.S., Srivastava, D.: X3: A Cube Operator for XML OLAP. In: ICDE, pp. 916–925 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ravat, F., Teste, O., Tournier, R., Zurfluh, G. (2008). Top_Keyword: An Aggregation Function for Textual Document OLAP. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics