B+-tree construction on massive data with Hadoop
- 158 Downloads
The data processing in the Socialist Republic of Vietnam (Vietnam, hereunder) is in an early stage and a variety of problems are needed to be solved. In the Vietnamese banking and financial sectors, where managing and storing of customer data and transaction histories are being emphasized as never before, the volume of data to be secured on a daily basis are explosively increasing due to rapid economic development so that the relevant authorities are seeking an efficient and reliable way to manage them. Being a widely known popular variation of B-tree, B+-tree is considered as a most adequate tree-type data structure for bulk data. Nevertheless, as it is quite time-consuming to construct a B+-tree for massive data the authors propose a Hadoop framework-based parallel B+-tree system to deal with the problem. The system is largely divided into three phases: First, data are partitioned and distributed evenly such that each partition will have almost the same amount of data volume. Second, a parallel local B+-tree system is constructed. Finally, some small-scale B+-trees are constructed and integrated into the complete form of B+-tree which will be dealing with an entire data set. The authors expect that the proposed system will offer an efficient index structuring while reducing data processing time.
KeywordsB-tree B+-tree Hadoop Map-Reduce Big Data Cloud Computing
The part of this paper  was presented International Conference on Information Science and Applications (ICISA 2017), March 20th–23th at MACAU. I am grateful to two anonymous commentators who have contributed to the enhancement of the paper’s completeness with their valuable suggestions at the Conference.
- 2.Cong, V.N.H., et al.: Improving the quality of an R-tree using the Map-Reduce framework. Advanced Multimedia and Ubiquitous Engineering, (CUTE 2016), vol. 448, pp. 164–170. Springer, Singapore (2017)Google Scholar
- 3.Cong, V.N.H: Enhanced R-tree bulk loading scheme using Map-Reduce framework. M.S. Thesis of Department of IT Convergence and Application Engineering, pp. 4–22. The Graduate School, Pukyong National University, Republic of Korea (2017)Google Scholar
- 4.Leutenegger, S.T., Edgington, J.M., Lopez, M.A.: STR: a simple and efficient algorithm for R-tree packing.In: IEEE 13th International Conference on Data Engineering, pp. 497–506 (1997)Google Scholar
- 5.Kajioka, S., Mori, T., Uchiya, T., Takumi, I., Matsuo, H.: Experiment of indoor position presumption based on RSSI of Bluetooth LE beacon, In: 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE), pp. 337–339. IEEE (2014)Google Scholar
- 6.Huh, J.-H., Je, S.-M., Seo, K.: Design and configuration of avoidance technique for worst situation in zigbee communications using OPNET. Information Science and Applications (ICISA). LNEE, vol. 376, pp. 331–336. Springer, Heidelberg (2016)Google Scholar
- 9.Apache Hadoop: http://hadoop.apache.org
- 10.Prasad, S.K., McDermott, M., He, X.: GPGPU-based parallel R-tree construction and querying. In: 2015 IEEE International Conference (IPDPSW), pp. 619–627 (2015)Google Scholar
- 16.Viglas, S.D.: Adapting the B+-tree for asymmetric I/O. In: East European Conference on Advances in Databases and Information Systems, pp. 399-412. Springer, Berlin, Heidelberg (2012)Google Scholar
- 17.Abdullahi, A.U., Ahmad, R., Zakaria, M.N.: Experimental performance analysis of B+-trees with Big Data indexing potentials. In: International Conference of Reliable Information and Communication Technology, pp. 20-29. Springer, New York (2017)Google Scholar