Abstract
A Geospatial Data Warehouse (GDW) is a repository of historical and geospatial data used in the decision-making process. These systems manage large volumes of data, and their dimensions are usually denormalized to increase query performance. Many studies have analyzed the impact of geospatial data redundancy on a relational GDW. However, to the best of our knowledge, no previous study performed a similar analysis considering the NoSQL scenario. In this context, to design a scalable document-oriented GDW (DGDW) with low storage cost and low query response time, it is important to identify which geospatial fields should be normalized (referenced) or denormalized (embedded), as well as how the documents should be partitioned among collections. In this study, we exhaustively evaluated 36 DGDWs in the MongoDB document-oriented database with different levels of geospatial redundancy and different approaches to partitioning documents among collections. Our experimental results indicate that both the normalization of low-selectivity geospatial fields and the partitioning of documents into homogenous collections provide better query performance and lower storage space. The performance evaluation presented in this paper provides strong evidence that can help guide the creation of a DGDW.
This work was supported by Fundação de Amparo à Pesquisa do Estado de Alagoas (FAPEAL).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The c_address and s_address dimensions will be mentioned as address, since they have the same document type (i.e., the same field structure).
- 2.
The queries are available in https://github.com/mrcferro/gdw.
References
Almeida, R., Bernardino, J., Furtado, P.: Testing SQL and NoSQL approaches for big data warehouse systems. Int. J. Bus. Process. Integr. Manag. 7(4), 322–334 (2015). https://doi.org/10.1504/IJBPIM.2015.073656. https://www.inderscienceonline.com/doi/abs/10.1504/IJBPIM.2015.073656
Chavalier, M., Malki, M.E., Kopliku, A., Teste, O., Tournier, R.: Document-oriented data warehouses: models and extended cuboids, extended cuboids in oriented document. In: 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pp. 1–11, June 2016. https://doi.org/10.1109/RCIS.2016.7549351
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Document-oriented models for data warehouses. In: Proceedings of the 18th International Conference on Enterprise Information Systems, ICEIS 2016, vol. 1, pp. 142–149, December 2016
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: 17th International Conference on Enterprise Information Systems, Proceedings 1, ICEIS 2015, pp. 172–183 (2015)
DB-Engines: DB-engines ranking, September 2018. https://db-engines.com/en/ranking. Accessed 20 Sept 2018
Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Using the column oriented NoSQL model for implementing big data warehouses. In: 21st International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 469–475 (2015)
Fidalgo, R.N., Times, V.C., Silva, J., Souza, F.F.: GeoDWFrame: a framework for guiding the design of geographical dimensional schemas. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 26–37. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30076-2_3
Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Wiley, Hoboken (2011)
Lee, J.G., Kang, M.: Geospatial big data: challenges and opportunities. BigData Res. 2(2), 74 – 81 (2015). https://doi.org/10.1016/j.bdr.2015.01.003. http://www.sciencedirect.com/science/article/pii/S2214579615000040, visions on Big Data
Liu, Y., Vitolo, T.M.: Graph data warehouse: steps to integrating graph databases into the traditional conceptual structure of a data warehouse. In: 2013 IEEE International Congress on Big Data, pp. 433–434, June 2013. https://doi.org/10.1109/BigData.Congress.2013.72
Luo, W., Liu, B., Watfa, A.K.: An open schema for XML data in hive. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 25–31, October 2014. https://doi.org/10.1109/BigData.2014.7004409
Mateus, R., Siqueira, T., Times, V., Ciferri, R., Ciferri, C.: How does the spatial data redundancy affect query performance in geographic data warehouses? J. Inf. Data Manag. 1(3), 519 (2010)
Mateus, R.C., Siqueira, T.L.L., Times, V.C., Ciferri, R.R., de Aguiar Ciferri, C.D.: Spatial data warehouses and spatial OLAP come towards the cloud: design and performance. Distrib. Parallel Databases 34(3), 425–461 (2016). https://doi.org/10.1007/s10619-015-7176-z
O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). Pat 200(0)50 (2007)
Rigaux, P., Scholl, M., Voisard, A.: Spatial Databases: With Application to GIS. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Scabora, L.C., Brito, J.J., Ciferri, R.R., Ciferri, C.D.d.A., et al.: Physical data warehouse design on NoSQL databases OLAP query processing over HBase. In: International Conference on Enterprise Information Systems, XVIII. Institute for Systems and Technologies of Information, Control and Communication-INSTICC (2016)
Siqueira, T.L.L., de Aguiar Ciferri, C.D., Times, V.C., de Oliveira, A.G., Ciferri, R.R.: The impact of spatial data redundancy on SOLAP query performance. J. Braz. Comput. Soc. 15(2), 19–34 (2009)
Siqueira, T.L.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: Investigating the effects of spatial data redundancy in query performance over geographical data warehouses. In: Proceedings of the 10th Brazilian Symposium on Geoinformatics, pp. 1–12 (2008)
Siqueira, T.L.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: Benchmarking spatial data warehouses. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 40–51. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15105-7_4
Song, J., Guo, C., Wang, Z., Zhang, Y., Yu, G., Pierson, J.M.: HaoLap: a hadoop based olap system for big data. J. Syst. Softw. 102, 167–181 (2015)
Tria, F.D., Lefons, E., Tangorra, F.: Design process for big data warehouses. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 512–518, October 2014. https://doi.org/10.1109/DSAA.2014.7058120
Yangui, R., Nabli, A., Gargouri, F.: Automatic transformation of data warehouse schema to nosql data base: comparative study. Procedia Comput. Sci. 96, 255–264 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ferro, M., Lima, R., Fidalgo, R. (2019). Evaluating Redundancy and Partitioning of Geospatial Data in Document-Oriented Data Warehouses. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-27520-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27519-8
Online ISBN: 978-3-030-27520-4
eBook Packages: Computer ScienceComputer Science (R0)