Skip to main content

Multidimensional Database Design from Document-Centric XML Documents

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6862))

Included in the following conference series:

Abstract

Despite a decade of research in OLAP systems, very few works attempt to tackle the problem of analysing data extracted from XML text-rich documents. These documents are loosely structured XML documents mainly composed of text. This paper details conceptual design steps of multidimensional databases from such documents. With the use of an adapted multidimensional conceptual model, the design process allows the integration of data extracted from text-rich XML documents within an adapted OLAP system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Annoni, E., Ravat, F., Teste, O., Zurfluh, G.: Towards Multidimensional Requirement Design. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 75–84. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Atigui, F., Ravat, F., Tournier, R., Zurfluh, G.: A Unified Model Driven Methodology for Data Warehouses and ETL design. In: 13th Intl. Conf. on Enterprise Information Systems, ICEIS (to appear, 2011)

    Google Scholar 

  3. Bonifati, A., Cattaneo, F., Ceri, S., Fuggetta, A., Paraboschi, S.: Designing data marts for data Warehouses. ACM Trans. Softw. Eng. Methodol. 10(4), 452–483 (2001)

    Article  Google Scholar 

  4. Cabibbo, L., Torlone, R.: A Logical Approach to Multidimensional Databases. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 183–197. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Carneiro, L., Brayner, A.: X-META: A methodology for data warehouse design with metadata management. In: 4th Intl. Workshop Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS.org), vol. 58, pp. 13–22 (2002)

    Google Scholar 

  6. Cavero, J.M., Piattini, M., Marcos, E.: MIDEA: A Multidimensional Data Warehouse Methodology. In: 3rd Intl. Conf. on Enterprise Information Systems (ICEIS 2001), vol. 1, pp. 138–144. INSTICC Press (2001)

    Google Scholar 

  7. Draper, D., Halevy, A.Y., Weld, D.S.: The Nimble XML Data Integration System. In: Proc. of the 17th Intl. Conf. on Data Engineering (ICDE), pp. 155–160. IEEE Comp. Society, Los Alamitos (2001)

    Google Scholar 

  8. Giorgini, P., Rizzi, S., Garzetti, M.: Goal-oriented requirement analysis for datawarehouse design. In: Proc. of 8th Int. Workshop on Data Warehousing and OLAP (DOLAP), pp. 47–56. ACM Press, New York (2005)

    Google Scholar 

  9. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Amsterdam (1989)

    MATH  Google Scholar 

  10. Golfarelli, M., Rizzi, S.: Methodological Framework for Data Warehouse Design. In: ACM 1st Intl. Workshop on Data Warehousing and OLAP (DOLAP), pp. 3–9. ACM Press, New York (1998)

    Chapter  Google Scholar 

  11. Gyssens, M., Lakshmanan, L.V.S.: A Foundation for Multi-dimensional Databases. In: 23rd Intl. Conf. on Very Large Data Bases (VLDB), pp. 106–115. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  12. Hüsemann, B., Lechtenbörger, J., Vossen, G.: Conceptual data warehouse modeling. In: Proc. of 2nd Int. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS.org), vol. 28, p. 6 (2000)

    Google Scholar 

  13. INEX, INitiative for the Evaluation of XML Retrieval (INEX), XML document collection used until 2005 (2005), http://inex.is.informatik.uni-duisburg.de/

  14. Jensen, M.R., Holmgren, T., Pedersen, T.B.: Discovering Multidimensional Structure in Relational Data. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 138–148. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Kimball, R.: The data warehouse toolkit. John Wiley and Sons, Chichester (1996); 2nd edn. (2003)

    Google Scholar 

  16. Luján-Mora, S., Trujillo, J.: A Comprehensive Method for Data Warehouse Design. In: 5th Intl. Workshop on Design and Management of Data Warehouses (DMDW 2003). CEUR Workshop Proceedings (CEUR-WS.org), vol. 77 (2003)

    Google Scholar 

  17. Melton, J., Buxton, S.: Querying XML, XQuery, XPath and SQL/XML in context. Elsevier, Morgan Kaufman (2006)

    MATH  Google Scholar 

  18. Moody, D., Kortink, M.: From enterprise models to dimensional models: a methodology for data warehouse and data mart design. In: Proc. of 2nd Int. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS.org), vol. 28, p. 5 (2000)

    Google Scholar 

  19. Noy, N.F.: Semantic integration: a survey of ontology-based approaches. SIGMOD Record 33(4), 65–70 (2004)

    Article  Google Scholar 

  20. Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: Integrating Data Warehouses with Web Data: A Survey. IEEE Trans. on Knowledge and Data Engineering (TKDE) 20(7), 940–955 (2008)

    Article  Google Scholar 

  21. Phipps, C., Davis, K.C.: Automating data warehouse conceptual schema design and evaluation. In: 4th Intl. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS., vol. 58, pp. 23–32 (2002)

    Google Scholar 

  22. Potvin, J.-Y.: Genetic algorithms for the traveling salesman problem. Annals of Operations Research 63(3), 337–370 (1996)

    Article  MATH  Google Scholar 

  23. Prat, N., Akoka, J., Comyn-Wattiau, I.: A UML-based data warehouse design method. Decision Support System 42(3), 1449–1473 (2006)

    Article  MATH  Google Scholar 

  24. Ravat, F., Teste, O., Tournier, R., Zurlfluh, G.: A Conceptual Model for Multidimensional Analysis of Documents. In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 550–565. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Designing and Implementing OLAP Systems from XML Documents. In: Submitted to Annals of Information Systems (AoIS), Special Issue on New Trends in Data Warehousing and Data Analysis. Springer, Heidelberg

    Google Scholar 

  26. Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Finding an Application-Appropriate Model for XML Data Warehouses. Information Systems (IS) 36(6), 662–687 (2010)

    Article  Google Scholar 

  27. Romero, O., Abello, A.: A framework for multidimensional design of data warehouses from ontologies. J. Data & Knowledge Engineering 69(11), 1138–1157 (2010)

    Article  Google Scholar 

  28. Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)

    Article  MATH  Google Scholar 

  29. Song, I.-Y., Khare, R., Dai, B.: SAMSTAR: a semi-automated lexical method for generating STAR schemas from an ER diagram. In: Proc. of the 10th Int. Workshop on Data Warehousing and OLAP (DOLAP), pp. 9–16. ACM Press, New York (2007)

    Google Scholar 

  30. Sullivan, D.: Document Warehousing and Text Mining. Wiley John & Sons, West Sussex (2001)

    Google Scholar 

  31. Torlone, R.: « Conceptual Multidimensional Models ». In: Rafanelli, M. (ed.) Multidimensional Databases: Problems and Solutions, ch. 3, pp. 69–90. Idea Publishing Group, IGP (2003)

    Google Scholar 

  32. Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. J. of Decision Support Systems (DSS) 42(2), 727–744 (2006)

    Article  Google Scholar 

  33. Tsois, A., Karayannidis, N., Sellis, T.: MAC: Conceptual Data Modelling for OLAP. In: 3rd Intl. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Worshop Proceedings, WS-CEUR.org, vol. 39, p. 5 (2001)

    Google Scholar 

  34. W3C XQuery, « XQuery 1.0 and XPath 2.0 Formal Semantics », recommandation du W3C (January 23, 2007), http://www.w3.org/TR/xquery-semantics/

  35. Winter, R., Strauch, B.: A method for demand-driven information requirements analysis in DW projects. In: Proc. of 36th Annual Hawaii Int. Conf. on System Sciences, pp. 231–239. IEEE Comp. Society, Los Alamitos (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pujolle, G., Ravat, F., Teste, O., Tournier, R., Zurfluh, G. (2011). Multidimensional Database Design from Document-Centric XML Documents. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2011. Lecture Notes in Computer Science, vol 6862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23544-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23544-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23543-6

  • Online ISBN: 978-3-642-23544-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics