A Hybrid Parallel Computing Model to Support Scalable Processing of Big Oceanographic Spatial Data
Oceanographic sciences are facing big challenges due to the deluge of big data. As of 2010, the amount of new data stored in the world main countries, led by the US, has grown over 7 exabytes. Although the computer hardware is quickly evolving, with faster processor frequency, multi-core technology, and larger memory, traditional reprocessing paradigm on a single-desktop basis still suffers from significant limitations in its low computational efficiency and scalability. In this paper, we report our effort in developing a hybrid parallel computing model which utilizes Graphic Processing Unit (GPU) to accelerate Hadoop Map Reduce system. In each computing node, the actual reprocessing is offloaded from a CPU to a GPU to further boost up the system performance. We describe the architecture design of the proposed model and the automated task/data assignment on each GPU-enabled compute node. Electronic Navigational Charts in ocean fields involves a huge amount of spatio-temporal data. Reprojection of these data between different coordinate reference systems, which is a computation-intensive task, is selected as the use case. Systematic experiments were conducted to demonstrate the good performance of the proposed model.
KeywordsParallel computing Hadoop MapReduce GPU general computing Oceanographic spatial data Coordinate projection
The research work report in this paper was mainly supported by the Young Scientists Funds (Grant No. 2015QN027) from Shandong Academy of Sciences. It was partially sponsored by the Youth Fund of Natural Science of China (Grant No. 41401435).
- 1.Mitchell, A.E., et al.: NASA’s earth observing data and information system-supporting interoperability through a scalable architecture. AGU Fall Meet. Abstr. 1 (2013)Google Scholar
- 2.Shekhar, S., Gunturi, V., Evans, M.R., Yang, K.: Spatial big-data challenges intersecting mobility and cloud computing. In: Proceedings of the Eleventh ACM International Workshop on Data Engineering for Wireless and Mobile Access. ACM, Scottsdale, pp. 1–6 (2012)Google Scholar
- 3.Miao, X., Hao, L.: An implementation of GPU accelerated MapReduce: using Hadoop with OpenCL for data- and compute-intensive jobs. In: 2012 International Joint Conference on Service Sciences (IJCSS), pp. 6–11 (2012)Google Scholar
- 5.Hecht, H., Berking, B., Buttgenbach, G., et al.: The Electronic Chart: Functions, Potential, and Limitations of a New Marine Navigation System. GITC bv, Lemmer (2006)Google Scholar
- 6.Shvachko, K., Kuang, H., Radia, S., et al.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)Google Scholar
- 7.Shao, G., Berman, F., Wolski, R.: Master/slave computing on the grid. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW 2000), pp. 3–16. IEEE (2000)Google Scholar
- 8.Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA. In: GPU Computing Gems Jade Edition, vol. 2, pp. 359–371 (2011)Google Scholar