Skip to main content

Evaluation of Data Warehouse Design Methodologies in the Context of Big Data

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Included in the following conference series:

Abstract

The data warehouse design methodologies require a novel approach in the Big Data context, because the methodologies have to provide solutions to face the issues related to the 5 Vs (Volume, Velocity, Variety, Veracity, and Value). So it is mandatory to support the designer through automatic techniques able to quickly produce a multidimensional schema using and integrating several data sources, which can be also unstructured and, therefore, need an ontology-based reasoning. Accordingly, the methodologies have to adopt agile techniques, in order to change the multidimensional schema as the business requirements change, without a complete design process. Furthermore, hybrid approaches must be used instead of the traditional data-driven or requirement-driven approaches, in order to avoid missing the adhesion to user requirements and to produce a valuable multidimensional schema compliant with data sources. In the paper, we perform a metric comparison among different methodologies, in order to demonstrate that methodologies classified as hybrid, ontology-based, automatic, and agile are tailored for the Big Data context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.thesaurus.com/browse/granting.

References

  1. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)

    Article  Google Scholar 

  2. Buneman, P., Davidson, S., Fernandez, M., Suciu, D.: Adding structure to unstructured data. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 336–350. Springer, Heidelberg (1997). doi:10.1007/3-540-62222-5_55

    Google Scholar 

  3. Rehman, N.U., Mansmann, S., Weiler, A., Scholl, M.H.: Building a data warehouse for twitter stream exploration. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 1341–1348. IEEE Computer Society (2012)

    Google Scholar 

  4. Waters, R.D., Jamal, J.Y.: Tweet, tweet, tweet: a content analysis of nonprofit organizations’ twitter updates. Public Relat. Rev. 37(3), 321–324 (2011)

    Article  Google Scholar 

  5. He, L., Chen, Y., Meng, N., Liu, L.Y.: An ontology-based conceptual modeling method for data warehouse. In: International Conference on Information Technology, Computer Engineering and Management Sciences, vol. 4, pp. 130–133. IEEE (2011)

    Google Scholar 

  6. Vranesic, H., Rovan, L.: Ontology-based data warehouse development process. In: International Conference on Information Technology Interfaces, pp. 205–210. IEEE Computer Society (2009)

    Google Scholar 

  7. Di Tria, F., Lefons, E., Tangorra, F.: Ontological approach to data warehouse source integration. In: Gelenbe, E., Lent, R. (eds.) Information Sciences and Systems. Lecture Notes in Electrical Engineering, vol. 264, pp. 251–259. Springer, Heidelberg (2013). doi:10.1007/978-3-319-01604-7_25

    Google Scholar 

  8. Khouri, S., Bellatreche, L.: DWOBS: data warehouse design from ontology-based sources. In: Yu, J.X., Kim, M.H., Unland, R. (eds.) DASFAA 2011. LNCS, vol. 6588, pp. 438–441. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20152-3_34

    Chapter  Google Scholar 

  9. Thenmozhi, M., Vivekanandan, K.: A tool for data warehouse multidimensional schema design using ontology. Int. J. Comput. Sci. Issues 10(2), 161–168 (2013)

    Google Scholar 

  10. Farooq, F., Sarwar, S.M.: Real-time data warehousing for business intelligence. In: Proceedings of the 8th International Conference on Frontiers of Information Technology, pp. 38:1–38:7. ACM, New York (2010)

    Google Scholar 

  11. Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Columnar NoSQL CUBE: aggregation operator for columnar NoSQL data warehouse. In: 2014 IEEE International Conference on Systems, Man and Cybernetics, pp. 3828–3833. IEEE (2014)

    Google Scholar 

  12. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: How can we implement a multidimensional data warehouse using NoSQL? In: Hammoudi, S., Maciaszek, L., Teniente, E., Camp, O., Cordeiro, J. (eds.) ICEIS 2015. LNBIP, vol. 241, pp. 108–130. Springer, Cham (2015). doi:10.1007/978-3-319-29133-8_6

    Chapter  Google Scholar 

  13. Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. Proc. VLDB Endow. 5(12), 2032–2033 (2012). VLDB Endowment

    Article  Google Scholar 

  14. Di Tria, F., Lefons, E., Tangorra, F.: Data warehouse automatic design methodology. In: Hu, W., Kaabouch, N. (eds.) Big Data Management, Technologies, and Applications, pp. 115–149. IGI Global, Hershey (2014)

    Chapter  Google Scholar 

  15. Phipps, C., Davis, K.C.: Automating data warehouse conceptual schema design and evaluation. In: Lakshmanan, L.V.S. (ed.) Design and Management of Data Warehouses, vol. 58, pp. 23–32. CEUR-WS.org, Toronto (2002)

    Google Scholar 

  16. Corr, L., Stagnitto, J.: Agile data warehouse design: collaborative dimensional modeling, from whiteboard to star schema. DecisionOne Consulting (2011)

    Google Scholar 

  17. Mazón, J.N., Trujillo, J.: A hybrid model driven development framework for the multidimensional modeling of data warehouses! ACM SIGMOD Rec. 38(2), 12–17 (2009)

    Article  Google Scholar 

  18. Mazón, J.N., Trujillo, J., Lechtenbörger, J.: Reconciling requirement-driven data warehouses with data sources via multidimensional normal forms. Data Knowl. Eng. 63, 725–751 (2007)

    Article  Google Scholar 

  19. Di Tria, F., Lefons, E., Tangorra, F.: Academic data warehouse design using a hybrid methodology. Comput. Sci. Inf. Syst. 12(1), 135–160 (2015)

    Article  Google Scholar 

  20. Di Tria, F., Lefons, E., Tangorra, F.: Hybrid methodology for data warehouse conceptual design by UML schemas. Inf. Softw. Technol. 54(4), 360–379 (2012)

    Article  Google Scholar 

  21. Romero, O., Abelló, A.: A survey of multidimensional modeling methodologies. Int. J. Data Warehous. Min. 5, 1–23 (2009)

    Article  Google Scholar 

  22. Golfarelli, M., Maio, D., Rizzi, S.: The dimensional fact model: a conceptual model for data warehouses. Int. J. Coop. Inf. Syst. 7(2), 215–247 (1998)

    Article  Google Scholar 

  23. Mazón, J.N., Trujillo, J., Serrano, M., Piattini, M.: Designing data warehouses: from business requirement analysis to multidimensional modeling. In: REBNITA, vol. 5, pp. 44–53 (2005)

    Google Scholar 

  24. dell’Aquila, C., Di Tria, F., Lefons, E., Tangorra, F.: Dimensional fact model extension via predicate calculus. In: 24th International Symposium on Computer and Information Sciences, pp. 211–217. IEEE (2009)

    Google Scholar 

  25. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis practices for big data. Proc. VLDB Endow. 2(2), 1481–1492 (2009). VLDB Endowment

    Article  Google Scholar 

  26. Di Tria, F., Lefons, E., Tangorra, F.: Cost-benefit analysis of data warehouse design methodologies. Inf. Syst. 63, 47–62 (2017)

    Article  Google Scholar 

  27. Serrano, M.A., Calero, C., Piattini, M.: Metrics for data warehouse quality. In: Effective Databases for Text & Document Management, pp. 156–173. IGI Global (2003)

    Google Scholar 

  28. Serrano, M., Calero, C., Sahraoui, H.A., Piattini, M.: Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw. Qual. J. 16(1), 79–106 (2008)

    Article  Google Scholar 

  29. Ley, M.: DBLP: some lessons learned. Proc. VLDB Endow. 2(2), 1493–1500 (2009). VLDB Endowment

    Article  MathSciNet  Google Scholar 

  30. Foxvog, D.: Cyc. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 259–278. Springer, Dordrecht (2010). doi:10.1007/978-90-481-8847-5_12

    Chapter  Google Scholar 

  31. dell’Aquila, C., Di Tria, F., Lefons, E., Tangorra, F.: Logic programming for data warehouse conceptual schema validation. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 1–12. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15105-7_1

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Di Tria .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Di Tria, F., Lefons, E., Tangorra, F. (2017). Evaluation of Data Warehouse Design Methodologies in the Context of Big Data. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64283-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64282-6

  • Online ISBN: 978-3-319-64283-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics