Framework for Geospatial Query Processing by Integrating Cassandra with Hadoop

Chapter

Abstract

Nowadays we are moving towards digitization and making all our devices such as sensors, cameras connected to Internet producing big data. This big data has variety of data and has paved the way for the emergence of NoSQL databases, like Cassandra for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this work, we mainly investigated on storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra’s default partitioning algorithm, i.e. Murmur3Partitioner techniques. Geohash value is generated that acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When user requests for spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results showed that prefix-based partitioning technique is efficient than Murmur3 partitioning technique.

Keywords

Big data Spatial query Geohash Cassandra NoSQL databases Murmur3Partitioner Prefix-based partitioning 

References

  1. 1.
    Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., et al. (2013). Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. Proceedings of VLDB Endowment, 6(11), 1009.Google Scholar
  2. 2.
    Benkirane, M., & Kettani, D. (2017). www.aui.ma/personal/~D.Kettani/courses/gis/GDB-benkirane.ppt Last accessed April 12, 2017.
  3. 3.
    Berry, J. K. (1987). Fundamental operations in computer-assisted map analysis. International Journal of GIS, 1, 119–136.Google Scholar
  4. 4.
    Bobov, R. (2017). Spatial data visualization spatial data. https://portal.opengeospatial.org/files/?artifact_id=73214. Last accessed April 12, 2017.
  5. 5.
    Brahim, M. B., Drira, W., Filali, F., & Hamdi, N. (2016). Spatial data extension for Cassandra NoSQL database. Journal of Big Data, 3, 11.Google Scholar
  6. 6.
    DataStax Apache Cassandra Documentation. (2016). http://www.odbms.org/wp-content/uploads/2013/11/cassandra10.pdf. Last accessed October 20, 2016.
  7. 7.
    Dubey, N. K., & Agrawalan, S. (2015). Efficient approach to find nearest location using geohashing on Hadoop and Pig. International Journal of Engineering Research-Online, 3(3), 771–777.Google Scholar
  8. 8.
    Fox, A., Eichelberger, C., Hughes, J., & Lyon, S. (2013). Spatio-temporal indexing in non-relational distributed databases. Commonwealth Computer Research, Inc. IEEE.Google Scholar
  9. 9.
    Geohash and Its Format. http://geohash.org/site/tips.htmlLast. Accessed January 3, 2016.
  10. 10.
  11. 11.
    Hadoop vs. Cassandra. (2017). https://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop. Last accessed April 12, 2017.
  12. 12.
    Lakhshman, A., & Malik, P. (2010). Cassandra: A decentralized structured storage system. ACM SIGOPS Operating System Review, 44(2), 35–40.Google Scholar
  13. 13.
    Lee, D. T. (1982). On k-nearest neighbor Voronoi diagrams in the Plane. IEEE Transactions Computers.Google Scholar
  14. 14.
    Lee, K., Ganti, R. K., Srivatsa, M., & Liu, L. (2014). Efficient spatial query processing for big data. In ACM SIGSPATIAL ’14, November 04–07, 2014.Google Scholar
  15. 15.
    Lenka, R. K., Barik, R. K., Gupta, N., Ali, S. M., Rath, A., & Dubey, H. (2016). Comparative analysis of SpatialHadoop and GeoSpark for geospatial big data analytics. Cornell University Library.Google Scholar
  16. 16.
    Liao, H., Han, J., & Fang, J. (2010). Multi-dimensional index on Hadoop distributed file system. In Proceedings of IEEE Fifth International Conference on Networking, Architecture, and Storage (pp. 240–249).Google Scholar
  17. 17.
    Liu, X., Han, J., Zhong, Y., Han, C., & He, X. (2009). Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. In Proceedings of IEEE International Conference on Cluster Computing and Workshops (pp. 1–8).Google Scholar
  18. 18.
    Moniruzzaman, A. B., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics—Classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 1–13.Google Scholar
  19. 19.
    Movable Type Scripts: Geohashes. http://www.movable-type.co.uk/scripts/geohash.html. Last accessed April 12, 2017.
  20. 20.
    Tang, M., Yu, Y., Aref, W. G., Mahmood, A. R., Malluhi, Q. M., & Ouzzani, M. (2016). In-memory distributed spatial query processing and optimization. Purdue Technical Report 2016.Google Scholar
  21. 21.
    What are Longitudes and Latitudes. https://www.timeanddate.com/geography/longitude-latitude.html. Last accessed April 11, 2017.
  22. 22.
    Zhang, S., Han, J., Liu, Z., Wang, K., & Feng, S. (2009). Spatial queries evaluation with MapReduce. In Proceedings of GCC ‘09.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.VR Siddhartha Engineering CollegeKanuruIndia
  2. 2.Illinois State UniversityNormalUSA

Personalised recommendations