Skip to main content

A Semi-automatic Approach to Build XML Document Warehouse

  • Conference paper
  • First Online:
Book cover Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2014)

Abstract

Documents represent an interesting source for decisional analyses. They help decision makers to better understand the evolution of their business activities. Therefore, they merit to be warehoused for decision purposes within organizations. Generally, these documents exist in XML format and are described by multiple structures. In this paper, we present a semi-automatic approach to build the XML Document Warehouse. This approach is made up of two methods namely: Unification of structures of XML Structures, and Multidimensional modeling. More specifically, this paper focuses on the experiment and evaluation of the proposed approach for warehousing document-centric XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pérez, M.J.M., Berlanga, L.M.R., Aramburu, C.M.J., Pederson, T.B.: Contextualizing data warehouses with documents. In: Decision Support System (DSS), vol. 45, pp. 77–94. Elsevier (2008)

    Google Scholar 

  2. Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. In: Decision Support Systems (DSS), vol. 42, pp. 727–744. Elsevier (2006)

    Google Scholar 

  3. McCabe, M.C., Lee, J., Chowdhury, A., Grossman, D., Frieder, O.: On the design and evaluation of a multi-dimensional approach to information retrieval. In: Proceedings of the 23rd Annual International ACM SIGIR Conference, pp. 363–365 (2000)

    Google Scholar 

  4. Sullivan, D.: Document Warehousing and Text Mining: Techniques for Improving Business Operations. Marketing and Sales. Wiley, New York (2001)

    Google Scholar 

  5. Fuhr, N., Grobjohann, K.: XIRQL: a query language for information retrieval in XML documents. In: 24th International ACM Conference on Research and Development in Information Retrieval (SIGIR), pp, 172–180. ACM Press (2001)

    Google Scholar 

  6. Kamps, J., Marx, M., De Rijke, M., Sigurbjornsson, B.: Best-match querying from document-centric XML. In: Proceedings of the Seventh International Workshop the Web and Databases, pp. 55–60 (2004)

    Google Scholar 

  7. Feki, J., Ben Messaoud, I., Zurfluh, G.: Building an XML document warehouse. J. Decis. Syst. JDS 2013 22, 122–148 (2013)

    Google Scholar 

  8. Ben Messaoud, I., Feki, J., Khrouf, K., Zurfluh, G.: Unification of XML document structures for Document Warehouse (DocW). In: 13th International Conference on Entreprise Information Systems (ICEIS), pp. 85–94, Beijing (2011a)

    Google Scholar 

  9. Ben Messaoud, I., Feki, J., Zurfluh, G.: A first step for building a document warehouse: unification of XML documents. In: Sixth International Conference on Research Challenges in Information Science (RCIS), pp. 59–64, Spain (2012)

    Google Scholar 

  10. Ben Messaoud, I., Feki, J., Zurfluh, G.: Modélisation multidimensionnelle des documents XML. Revue des Nouvelles Technologies de l’Information (RNTI) B-7, 55–70 (2011b)

    Google Scholar 

  11. Ben Messaoud, I., Feki, J., Zurfluh, G.: Galaxy-Gen: a tool for building galaxy model from XML documents. In: 6th International Conference on Knowledge Engineering and Ontology Development KEOD 2014, Rome, Italie (2014)

    Google Scholar 

  12. Tournier, R.: Analyse en ligne (OLAP) des documents. Ph.D. thesis, University of Toulouse III, France (2007)

    Google Scholar 

  13. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 292–299, Virginia (2002)

    Google Scholar 

  14. Mello, R.D.S., Castano, S., Heuser, C.A.: A method for the unification of XML schemata. Inf. Softw. Technol. 44, 241–249 (2002)

    Article  Google Scholar 

  15. Yoo, C.-S., Woo, S.-M., Kim, Y.-S.: Unification of XML DTD for XML documents with similar structure. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3482, pp. 954–963. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Zhang, Y., Liu, W.: Semantic integration of XML schema. In: First International Conference on Machine Learning and Cybernetics, Beijing (2002)

    Google Scholar 

  17. De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: “Almost Automatic” and semantic integration of XML schemas at various “Severity” levels. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 4–21. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Boussaid, O., Ben Messaoud, R., Choquet, R., Anthoard, S., Conception et construction d’entrepôts XML. 2ème journée francophone surles Entrepôts de Données et l’Analyse en ligne EDA 2006, pp. 3–22, Versailles, France (2006)

    Google Scholar 

  19. Hachaichi, Y., Feki, J., Ben-Abdallah, H.: Modélisation multidimensionnelle de documents XML centrés-données. J. Decis. Syst. JDS 2010 19/3, 313–345 (2010)

    Google Scholar 

  20. Khrouf K.: Entrepôts de documents: De l’alimentation à l’exploitation. Thèse de doctorat en Informatique, Université Paul Sabatier, Toulouse, France (2004)

    Google Scholar 

  21. Ravat, F., Teste, O., Tournier, R.: Analyse multidimensionnelle de documents via des dimensions OLAP. Document numérique, Hermès, Numéro spécial Entreposage de documents et données semi-structurées, pp. 85–104 (2007)

    Google Scholar 

  22. Pujolle, G., Ravat, F., Teste, O., Tournier, R., Zurfluh, G.: Multidimensional database design from document-centric XML documents. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 51–65. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Jaro, M.A.: Advances in record linking methodology as applied to the 1985 census of Tampa Florida. J. Am. Stat. Soc. 89, 414–420 (1989)

    Article  Google Scholar 

  24. Aouabed, H., Ben Messaoud, I., Feki, J., Zurfluh, G.: USD: Un outil d’unification des structures des documents XML. 6ème Atelier des Systèmes Décisionnels ASD 2012, pp. 83–94, Blida, Algérie (2012)

    Google Scholar 

  25. Golfarelli, M., Maio, D., Rizzi, S.: Conceptual design of data warehouses from E/R schema. In: Proceedings of the 31st Annual Hawaii International Conference on System Sciences (HICSS 1998), pp. 334–343. IEEE Computer Society, Washington, D.C., USA (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ines Ben Messaoud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ben Messaoud, I., Feki, J., Zurfluh, G. (2015). A Semi-automatic Approach to Build XML Document Warehouse. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2014. Communications in Computer and Information Science, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-319-25840-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25840-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25839-3

  • Online ISBN: 978-3-319-25840-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics