A Raster Data Framework Based on Distributed Heterogeneous Cluster
- 72 Downloads
Advancements in satellite imaging and sensor technologies result in capturing of large amount of spatial data. Many parallel processing techniques based on data or control parallelism have been attempted during the past 2 decades to provide performance improvement in image processing applications such as urban sprawl, weather prediction and crop estimation. These techniques have used block-based distributed file processing or the more modern MapReduce-based programming for implementation which still have gaps between optimal and best processing in terms of resource scheduling, data distribution and ease of programming. In this paper, we present a layered framework for parallel data processing to improve storage, retrieval and processing performance of spatial data on an underlying distributed file system. The paper presents a data placement strategy across a distributed HDFS cluster in a way to optimize spatial data retrieval and processing. The presence of neighborhood pixels local to the processing node in a distributed environment reduces network latencies and improves the efficiency of applications such as object recognition, change detection and site selection. We evaluate the data placement strategy on a four-node HDFS cluster and show that it can deliver good performance benefits by way of reading blocks of data at almost 10–12 times the default, which contributes to the improvement in efficiency of the various applications that use region growing methods.
KeywordsBig data Remote sensing Spatial MapReduce HDFS RecordReader Raster data Vector data Resource scheduler Retrieval Engine
- Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., et al. (2013). Hadoop-GIS: A high performance spatial data warehousing system over mapreduce. VLDB, 6, 1009–1020.Google Scholar
- APACHE. (2010). Hadoop mapreduce framework. Available http://hadoop.apache.org/mapreduce/. Accessed Mar 2017.
- Cudre-Mauroux, P., et al. (2009). A demonstration of SciDB: A science-oriented DBMS. VLDB, 2(2), 1534–1537.Google Scholar
- Eldawy, A. (2014). SpatialHadoop: Towards flexible and scalable spatial processing using mapreduce (pp. 46–50). New York: ACM Press.Google Scholar
- Eldawy, A., & Mokbel, M. F. (2015). The era of big spatial data: A survey. DBSJ Journal, 13(1), 163–273.Google Scholar
- Kune, R., Konugurthi, P., Agarwal, A., Chillarige, R. R., & Buyya, R. (2015). XHAMI—Extended HDFS and mapreduce interface for image processing applications (pp. 43–51). https://doi.org/10.1109/ccem.
- Matlab. (2009–2018), The MathWorks Inc., Block processing. Available: https://www.mathworks.com/examples/image/mw/imagesex86052154-block-processing-large-images. Accessed Mar 2017.
- Nicolescu, C., & Jonker, P. (2001). A data and task parallel image processing environment. Lecture Notes in Computer Science (Vol. 2131, pp. 393–408).Google Scholar
- Saxena, S., Sharma, N., & Sharma, S. (2013). Image processing tasks using parallel computing in multicore architecture and its applications in medical imaging. International Journal of Advanced Research in Computer and communication Engineering, 2(4), 1896–1900.Google Scholar
- Sweeney, C., Liu, L., Arietta, S., & Lawrence, J. (2011).HIPI: A Hadoop image processing interface for image-based mapreduce tasks. B.S. thesis, University of Virginia.Google Scholar
- Tesfamariam, E. B. (2011). Distributed processing of large remote sensing images using mapreduce—A case of edge detection. MS Theses, Münster,North-Rhine Westphalia, Germany.Google Scholar
- Vemula, S., & Crick, C. (2015). Hadoop image processing framework (pp. 506–513). https://doi.org/10.1109/bigdatacongress.