Abstract
Nowadays we are moving towards digitization and making all our devices such as sensors, cameras connected to Internet producing big data. This big data has variety of data and has paved the way for the emergence of NoSQL databases, like Cassandra for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this work, we mainly investigated on storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra’s default partitioning algorithm, i.e. Murmur3Partitioner techniques. Geohash value is generated that acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When user requests for spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results showed that prefix-based partitioning technique is efficient than Murmur3 partitioning technique.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., et al. (2013). Hadoop-GIS: A high performance spatial data warehousing system over MapReduce. Proceedings of VLDB Endowment, 6(11), 1009.
Benkirane, M., & Kettani, D. (2017). www.aui.ma/personal/~D.Kettani/courses/gis/GDB-benkirane.ppt Last accessed April 12, 2017.
Berry, J. K. (1987). Fundamental operations in computer-assisted map analysis. International Journal of GIS, 1, 119–136.
Bobov, R. (2017). Spatial data visualization spatial data. https://portal.opengeospatial.org/files/?artifact_id=73214. Last accessed April 12, 2017.
Brahim, M. B., Drira, W., Filali, F., & Hamdi, N. (2016). Spatial data extension for Cassandra NoSQL database. Journal of Big Data, 3, 11.
DataStax Apache Cassandra Documentation. (2016). http://www.odbms.org/wp-content/uploads/2013/11/cassandra10.pdf. Last accessed October 20, 2016.
Dubey, N. K., & Agrawalan, S. (2015). Efficient approach to find nearest location using geohashing on Hadoop and Pig. International Journal of Engineering Research-Online, 3(3), 771–777.
Fox, A., Eichelberger, C., Hughes, J., & Lyon, S. (2013). Spatio-temporal indexing in non-relational distributed databases. Commonwealth Computer Research, Inc. IEEE.
Geohash and Its Format. http://geohash.org/site/tips.htmlLast. Accessed January 3, 2016.
Hadoop Support. (2017). https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configHadoop.html. Last accessed March 11, 2017.
Hadoop vs. Cassandra. (2017). https://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop. Last accessed April 12, 2017.
Lakhshman, A., & Malik, P. (2010). Cassandra: A decentralized structured storage system. ACM SIGOPS Operating System Review, 44(2), 35–40.
Lee, D. T. (1982). On k-nearest neighbor Voronoi diagrams in the Plane. IEEE Transactions Computers.
Lee, K., Ganti, R. K., Srivatsa, M., & Liu, L. (2014). Efficient spatial query processing for big data. In ACM SIGSPATIAL ’14, November 04–07, 2014.
Lenka, R. K., Barik, R. K., Gupta, N., Ali, S. M., Rath, A., & Dubey, H. (2016). Comparative analysis of SpatialHadoop and GeoSpark for geospatial big data analytics. Cornell University Library.
Liao, H., Han, J., & Fang, J. (2010). Multi-dimensional index on Hadoop distributed file system. In Proceedings of IEEE Fifth International Conference on Networking, Architecture, and Storage (pp. 240–249).
Liu, X., Han, J., Zhong, Y., Han, C., & He, X. (2009). Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. In Proceedings of IEEE International Conference on Cluster Computing and Workshops (pp. 1–8).
Moniruzzaman, A. B., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics—Classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 1–13.
Movable Type Scripts: Geohashes. http://www.movable-type.co.uk/scripts/geohash.html. Last accessed April 12, 2017.
Tang, M., Yu, Y., Aref, W. G., Mahmood, A. R., Malluhi, Q. M., & Ouzzani, M. (2016). In-memory distributed spatial query processing and optimization. Purdue Technical Report 2016.
What are Longitudes and Latitudes. https://www.timeanddate.com/geography/longitude-latitude.html. Last accessed April 11, 2017.
Zhang, S., Han, J., Liu, Z., Wang, K., & Feng, S. (2009). Spatial queries evaluation with MapReduce. In Proceedings of GCC ‘09.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Vasavi, S., Padma Priya, M., Gokhale, A.A. (2018). Framework for Geospatial Query Processing by Integrating Cassandra with Hadoop. In: Margret Anouncia, S., Wiil, U. (eds) Knowledge Computing and Its Applications. Springer, Singapore. https://doi.org/10.1007/978-981-10-6680-1_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-6680-1_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6679-5
Online ISBN: 978-981-10-6680-1
eBook Packages: Computer ScienceComputer Science (R0)