Abstract
Despite a decade of research in OLAP systems, very few works attempt to tackle the problem of analysing data extracted from XML text-rich documents. These documents are loosely structured XML documents mainly composed of text. This paper details conceptual design steps of multidimensional databases from such documents. With the use of an adapted multidimensional conceptual model, the design process allows the integration of data extracted from text-rich XML documents within an adapted OLAP system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Annoni, E., Ravat, F., Teste, O., Zurfluh, G.: Towards Multidimensional Requirement Design. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 75–84. Springer, Heidelberg (2006)
Atigui, F., Ravat, F., Tournier, R., Zurfluh, G.: A Unified Model Driven Methodology for Data Warehouses and ETL design. In: 13th Intl. Conf. on Enterprise Information Systems, ICEIS (to appear, 2011)
Bonifati, A., Cattaneo, F., Ceri, S., Fuggetta, A., Paraboschi, S.: Designing data marts for data Warehouses. ACM Trans. Softw. Eng. Methodol. 10(4), 452–483 (2001)
Cabibbo, L., Torlone, R.: A Logical Approach to Multidimensional Databases. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 183–197. Springer, Heidelberg (1998)
Carneiro, L., Brayner, A.: X-META: A methodology for data warehouse design with metadata management. In: 4th Intl. Workshop Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS.org), vol. 58, pp. 13–22 (2002)
Cavero, J.M., Piattini, M., Marcos, E.: MIDEA: A Multidimensional Data Warehouse Methodology. In: 3rd Intl. Conf. on Enterprise Information Systems (ICEIS 2001), vol. 1, pp. 138–144. INSTICC Press (2001)
Draper, D., Halevy, A.Y., Weld, D.S.: The Nimble XML Data Integration System. In: Proc. of the 17th Intl. Conf. on Data Engineering (ICDE), pp. 155–160. IEEE Comp. Society, Los Alamitos (2001)
Giorgini, P., Rizzi, S., Garzetti, M.: Goal-oriented requirement analysis for datawarehouse design. In: Proc. of 8th Int. Workshop on Data Warehousing and OLAP (DOLAP), pp. 47–56. ACM Press, New York (2005)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Amsterdam (1989)
Golfarelli, M., Rizzi, S.: Methodological Framework for Data Warehouse Design. In: ACM 1st Intl. Workshop on Data Warehousing and OLAP (DOLAP), pp. 3–9. ACM Press, New York (1998)
Gyssens, M., Lakshmanan, L.V.S.: A Foundation for Multi-dimensional Databases. In: 23rd Intl. Conf. on Very Large Data Bases (VLDB), pp. 106–115. Morgan Kaufmann, San Francisco (1997)
Hüsemann, B., Lechtenbörger, J., Vossen, G.: Conceptual data warehouse modeling. In: Proc. of 2nd Int. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS.org), vol. 28, p. 6 (2000)
INEX, INitiative for the Evaluation of XML Retrieval (INEX), XML document collection used until 2005 (2005), http://inex.is.informatik.uni-duisburg.de/
Jensen, M.R., Holmgren, T., Pedersen, T.B.: Discovering Multidimensional Structure in Relational Data. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 138–148. Springer, Heidelberg (2004)
Kimball, R.: The data warehouse toolkit. John Wiley and Sons, Chichester (1996); 2nd edn. (2003)
Luján-Mora, S., Trujillo, J.: A Comprehensive Method for Data Warehouse Design. In: 5th Intl. Workshop on Design and Management of Data Warehouses (DMDW 2003). CEUR Workshop Proceedings (CEUR-WS.org), vol. 77 (2003)
Melton, J., Buxton, S.: Querying XML, XQuery, XPath and SQL/XML in context. Elsevier, Morgan Kaufman (2006)
Moody, D., Kortink, M.: From enterprise models to dimensional models: a methodology for data warehouse and data mart design. In: Proc. of 2nd Int. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS.org), vol. 28, p. 5 (2000)
Noy, N.F.: Semantic integration: a survey of ontology-based approaches. SIGMOD Record 33(4), 65–70 (2004)
Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: Integrating Data Warehouses with Web Data: A Survey. IEEE Trans. on Knowledge and Data Engineering (TKDE) 20(7), 940–955 (2008)
Phipps, C., Davis, K.C.: Automating data warehouse conceptual schema design and evaluation. In: 4th Intl. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Workshop Proceedings (CEUR-WS., vol. 58, pp. 23–32 (2002)
Potvin, J.-Y.: Genetic algorithms for the traveling salesman problem. Annals of Operations Research 63(3), 337–370 (1996)
Prat, N., Akoka, J., Comyn-Wattiau, I.: A UML-based data warehouse design method. Decision Support System 42(3), 1449–1473 (2006)
Ravat, F., Teste, O., Tournier, R., Zurlfluh, G.: A Conceptual Model for Multidimensional Analysis of Documents. In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 550–565. Springer, Heidelberg (2007)
Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Designing and Implementing OLAP Systems from XML Documents. In: Submitted to Annals of Information Systems (AoIS), Special Issue on New Trends in Data Warehousing and Data Analysis. Springer, Heidelberg
Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Finding an Application-Appropriate Model for XML Data Warehouses. Information Systems (IS) 36(6), 662–687 (2010)
Romero, O., Abello, A.: A framework for multidimensional design of data warehouses from ontologies. J. Data & Knowledge Engineering 69(11), 1138–1157 (2010)
Sarawagi, S.: Information Extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)
Song, I.-Y., Khare, R., Dai, B.: SAMSTAR: a semi-automated lexical method for generating STAR schemas from an ER diagram. In: Proc. of the 10th Int. Workshop on Data Warehousing and OLAP (DOLAP), pp. 9–16. ACM Press, New York (2007)
Sullivan, D.: Document Warehousing and Text Mining. Wiley John & Sons, West Sussex (2001)
Torlone, R.: « Conceptual Multidimensional Models ». In: Rafanelli, M. (ed.) Multidimensional Databases: Problems and Solutions, ch. 3, pp. 69–90. Idea Publishing Group, IGP (2003)
Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. J. of Decision Support Systems (DSS) 42(2), 727–744 (2006)
Tsois, A., Karayannidis, N., Sellis, T.: MAC: Conceptual Data Modelling for OLAP. In: 3rd Intl. Workshop on Design and Management of Data Warehouses (DMDW). CEUR Worshop Proceedings, WS-CEUR.org, vol. 39, p. 5 (2001)
W3C XQuery, « XQuery 1.0 and XPath 2.0 Formal Semantics », recommandation du W3C (January 23, 2007), http://www.w3.org/TR/xquery-semantics/
Winter, R., Strauch, B.: A method for demand-driven information requirements analysis in DW projects. In: Proc. of 36th Annual Hawaii Int. Conf. on System Sciences, pp. 231–239. IEEE Comp. Society, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pujolle, G., Ravat, F., Teste, O., Tournier, R., Zurfluh, G. (2011). Multidimensional Database Design from Document-Centric XML Documents. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2011. Lecture Notes in Computer Science, vol 6862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23544-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-23544-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23543-6
Online ISBN: 978-3-642-23544-3
eBook Packages: Computer ScienceComputer Science (R0)