Framework for Geospatial Query Processing by Integrating Cassandra with Hadoop
Nowadays we are moving towards digitization and making all our devices such as sensors, cameras connected to Internet producing big data. This big data has variety of data and has paved the way for the emergence of NoSQL databases, like Cassandra for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this work, we mainly investigated on storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra’s default partitioning algorithm, i.e. Murmur3Partitioner techniques. Geohash value is generated that acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When user requests for spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results showed that prefix-based partitioning technique is efficient than Murmur3 partitioning technique.
KeywordsBig data Spatial query Geohash Cassandra NoSQL databases Murmur3Partitioner Prefix-based partitioning
- 1.Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., et al. (2013). Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. Proceedings of VLDB Endowment, 6(11), 1009.Google Scholar
- 2.Benkirane, M., & Kettani, D. (2017). www.aui.ma/personal/~D.Kettani/courses/gis/GDB-benkirane.ppt Last accessed April 12, 2017.
- 3.Berry, J. K. (1987). Fundamental operations in computer-assisted map analysis. International Journal of GIS, 1, 119–136.Google Scholar
- 4.Bobov, R. (2017). Spatial data visualization spatial data. https://portal.opengeospatial.org/files/?artifact_id=73214. Last accessed April 12, 2017.
- 5.Brahim, M. B., Drira, W., Filali, F., & Hamdi, N. (2016). Spatial data extension for Cassandra NoSQL database. Journal of Big Data, 3, 11.Google Scholar
- 6.DataStax Apache Cassandra Documentation. (2016). http://www.odbms.org/wp-content/uploads/2013/11/cassandra10.pdf. Last accessed October 20, 2016.
- 7.Dubey, N. K., & Agrawalan, S. (2015). Efficient approach to find nearest location using geohashing on Hadoop and Pig. International Journal of Engineering Research-Online, 3(3), 771–777.Google Scholar
- 8.Fox, A., Eichelberger, C., Hughes, J., & Lyon, S. (2013). Spatio-temporal indexing in non-relational distributed databases. Commonwealth Computer Research, Inc. IEEE.Google Scholar
- 9.Geohash and Its Format. http://geohash.org/site/tips.htmlLast. Accessed January 3, 2016.
- 10.Hadoop Support. (2017). https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configHadoop.html. Last accessed March 11, 2017.
- 11.Hadoop vs. Cassandra. (2017). https://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop. Last accessed April 12, 2017.
- 12.Lakhshman, A., & Malik, P. (2010). Cassandra: A decentralized structured storage system. ACM SIGOPS Operating System Review, 44(2), 35–40.Google Scholar
- 13.Lee, D. T. (1982). On k-nearest neighbor Voronoi diagrams in the Plane. IEEE Transactions Computers.Google Scholar
- 14.Lee, K., Ganti, R. K., Srivatsa, M., & Liu, L. (2014). Efficient spatial query processing for big data. In ACM SIGSPATIAL ’14, November 04–07, 2014.Google Scholar
- 15.Lenka, R. K., Barik, R. K., Gupta, N., Ali, S. M., Rath, A., & Dubey, H. (2016). Comparative analysis of SpatialHadoop and GeoSpark for geospatial big data analytics. Cornell University Library.Google Scholar
- 16.Liao, H., Han, J., & Fang, J. (2010). Multi-dimensional index on Hadoop distributed file system. In Proceedings of IEEE Fifth International Conference on Networking, Architecture, and Storage (pp. 240–249).Google Scholar
- 17.Liu, X., Han, J., Zhong, Y., Han, C., & He, X. (2009). Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. In Proceedings of IEEE International Conference on Cluster Computing and Workshops (pp. 1–8).Google Scholar
- 18.Moniruzzaman, A. B., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics—Classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 1–13.Google Scholar
- 19.Movable Type Scripts: Geohashes. http://www.movable-type.co.uk/scripts/geohash.html. Last accessed April 12, 2017.
- 20.Tang, M., Yu, Y., Aref, W. G., Mahmood, A. R., Malluhi, Q. M., & Ouzzani, M. (2016). In-memory distributed spatial query processing and optimization. Purdue Technical Report 2016.Google Scholar
- 21.What are Longitudes and Latitudes. https://www.timeanddate.com/geography/longitude-latitude.html. Last accessed April 11, 2017.
- 22.Zhang, S., Han, J., Liu, Z., Wang, K., & Feng, S. (2009). Spatial queries evaluation with MapReduce. In Proceedings of GCC ‘09.Google Scholar