Skip to main content

Evaluating Redundancy and Partitioning of Geospatial Data in Document-Oriented Data Warehouses

  • Conference paper
  • First Online:
Book cover Big Data Analytics and Knowledge Discovery (DaWaK 2019)

Abstract

A Geospatial Data Warehouse (GDW) is a repository of historical and geospatial data used in the decision-making process. These systems manage large volumes of data, and their dimensions are usually denormalized to increase query performance. Many studies have analyzed the impact of geospatial data redundancy on a relational GDW. However, to the best of our knowledge, no previous study performed a similar analysis considering the NoSQL scenario. In this context, to design a scalable document-oriented GDW (DGDW) with low storage cost and low query response time, it is important to identify which geospatial fields should be normalized (referenced) or denormalized (embedded), as well as how the documents should be partitioned among collections. In this study, we exhaustively evaluated 36 DGDWs in the MongoDB document-oriented database with different levels of geospatial redundancy and different approaches to partitioning documents among collections. Our experimental results indicate that both the normalization of low-selectivity geospatial fields and the partitioning of documents into homogenous collections provide better query performance and lower storage space. The performance evaluation presented in this paper provides strong evidence that can help guide the creation of a DGDW.

This work was supported by Fundação de Amparo à Pesquisa do Estado de Alagoas (FAPEAL).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The c_address and s_address dimensions will be mentioned as address, since they have the same document type (i.e., the same field structure).

  2. 2.

    The queries are available in https://github.com/mrcferro/gdw.

References

  1. Almeida, R., Bernardino, J., Furtado, P.: Testing SQL and NoSQL approaches for big data warehouse systems. Int. J. Bus. Process. Integr. Manag. 7(4), 322–334 (2015). https://doi.org/10.1504/IJBPIM.2015.073656. https://www.inderscienceonline.com/doi/abs/10.1504/IJBPIM.2015.073656

    Article  Google Scholar 

  2. Chavalier, M., Malki, M.E., Kopliku, A., Teste, O., Tournier, R.: Document-oriented data warehouses: models and extended cuboids, extended cuboids in oriented document. In: 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), pp. 1–11, June 2016. https://doi.org/10.1109/RCIS.2016.7549351

  3. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Document-oriented models for data warehouses. In: Proceedings of the 18th International Conference on Enterprise Information Systems, ICEIS 2016, vol. 1, pp. 142–149, December 2016

    Google Scholar 

  4. Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: 17th International Conference on Enterprise Information Systems, Proceedings 1, ICEIS 2015, pp. 172–183 (2015)

    Google Scholar 

  5. DB-Engines: DB-engines ranking, September 2018. https://db-engines.com/en/ranking. Accessed 20 Sept 2018

  6. Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Using the column oriented NoSQL model for implementing big data warehouses. In: 21st International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 469–475 (2015)

    Google Scholar 

  7. Fidalgo, R.N., Times, V.C., Silva, J., Souza, F.F.: GeoDWFrame: a framework for guiding the design of geographical dimensional schemas. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 26–37. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30076-2_3

    Chapter  Google Scholar 

  8. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Wiley, Hoboken (2011)

    Google Scholar 

  9. Lee, J.G., Kang, M.: Geospatial big data: challenges and opportunities. BigData Res. 2(2), 74 – 81 (2015). https://doi.org/10.1016/j.bdr.2015.01.003. http://www.sciencedirect.com/science/article/pii/S2214579615000040, visions on Big Data

    Article  MathSciNet  Google Scholar 

  10. Liu, Y., Vitolo, T.M.: Graph data warehouse: steps to integrating graph databases into the traditional conceptual structure of a data warehouse. In: 2013 IEEE International Congress on Big Data, pp. 433–434, June 2013. https://doi.org/10.1109/BigData.Congress.2013.72

  11. Luo, W., Liu, B., Watfa, A.K.: An open schema for XML data in hive. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 25–31, October 2014. https://doi.org/10.1109/BigData.2014.7004409

  12. Mateus, R., Siqueira, T., Times, V., Ciferri, R., Ciferri, C.: How does the spatial data redundancy affect query performance in geographic data warehouses? J. Inf. Data Manag. 1(3), 519 (2010)

    Google Scholar 

  13. Mateus, R.C., Siqueira, T.L.L., Times, V.C., Ciferri, R.R., de Aguiar Ciferri, C.D.: Spatial data warehouses and spatial OLAP come towards the cloud: design and performance. Distrib. Parallel Databases 34(3), 425–461 (2016). https://doi.org/10.1007/s10619-015-7176-z

    Article  Google Scholar 

  14. O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). Pat 200(0)50 (2007)

    Google Scholar 

  15. Rigaux, P., Scholl, M., Voisard, A.: Spatial Databases: With Application to GIS. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  16. Scabora, L.C., Brito, J.J., Ciferri, R.R., Ciferri, C.D.d.A., et al.: Physical data warehouse design on NoSQL databases OLAP query processing over HBase. In: International Conference on Enterprise Information Systems, XVIII. Institute for Systems and Technologies of Information, Control and Communication-INSTICC (2016)

    Google Scholar 

  17. Siqueira, T.L.L., de Aguiar Ciferri, C.D., Times, V.C., de Oliveira, A.G., Ciferri, R.R.: The impact of spatial data redundancy on SOLAP query performance. J. Braz. Comput. Soc. 15(2), 19–34 (2009)

    Article  Google Scholar 

  18. Siqueira, T.L.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: Investigating the effects of spatial data redundancy in query performance over geographical data warehouses. In: Proceedings of the 10th Brazilian Symposium on Geoinformatics, pp. 1–12 (2008)

    Google Scholar 

  19. Siqueira, T.L.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: Benchmarking spatial data warehouses. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 40–51. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15105-7_4

    Chapter  Google Scholar 

  20. Song, J., Guo, C., Wang, Z., Zhang, Y., Yu, G., Pierson, J.M.: HaoLap: a hadoop based olap system for big data. J. Syst. Softw. 102, 167–181 (2015)

    Article  Google Scholar 

  21. Tria, F.D., Lefons, E., Tangorra, F.: Design process for big data warehouses. In: 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 512–518, October 2014. https://doi.org/10.1109/DSAA.2014.7058120

  22. Yangui, R., Nabli, A., Gargouri, F.: Automatic transformation of data warehouse schema to nosql data base: comparative study. Procedia Comput. Sci. 96, 255–264 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcio Ferro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferro, M., Lima, R., Fidalgo, R. (2019). Evaluating Redundancy and Partitioning of Geospatial Data in Document-Oriented Data Warehouses. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27520-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27519-8

  • Online ISBN: 978-3-030-27520-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics