Abstract
As the growth of cluster scale, huge power consumption will be a major bottleneck for future large-scale high performance cluster. However, most existing cloud-clusters are based on power-hungry X86-64 which merely aims to common enterprise applications. In this paper, we improve the cluster performance by leveraging ARM SoCs which feature energy-efficient. In our prototype, cluster with five Cubieboard4, we run HPL and achieve 9.025 GFLOPS which exhibits a great computational potential. Moreover, we build our measurement model and conduct extensive evaluation by comparing the performance of the cluster with WordCount, k-Means (etc.) running in Map-Reduce mode and Spark mode respectively. The experiment results demonstrate that our cluster can guarantee higher computational efficiency on compute-intensive utilities with the RDD feature of Spark. Finally, we propose a more suitable theoretical hybrid architecture of future cloud clusters with a stronger master and customized ARMv8 based TaskTrackers for data-intensive computing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
GöDdeke, D., Komatitsch, D., Geveler, M., et al.: Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster. J. Comput. Phys. 237, 132–150 (2013)
Rajovic, N., Rico, A., Puzovic, N., et al.: Tibidabo: making the case for an ARM-based HPC system. Future Gener. Comput. Syst. 36, 322–334 (2014)
Ebrahimi, K., Jones, G.F., Fleischer, A.S.: A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities. Renew. Sustain. Energy Rev. 31, 622–638 (2014)
Turley, J.: Cortex-A15 “Eagle” flies the coop. Microprocess. Rep. 24(11), 1–11 (2010)
ARM Ltd.: Cortex-A50 series. http://www.arm.com
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003). ACM
Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Leverich, J., Kozyrakis, C.: On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)
Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10. IEEE (2010)
Zaharia, M., Konwinski, A., Joseph, A.D., et al.: Improving MapReduce performance in heterogeneous environments. In: OSDI, 8(4), p. 7 (2008)
Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
Fox, K., Mongan, W., Popyack, J.: Raspberry HadooPI: a low-cost, hands-on laboratory in big data and analytics. In: Proceedings of the 46th ACM Technical Symposium on Computer Science Education, p.687. ACM (2015)
Kaewkasi, C., Srisuruk, W.: A study of big data processing constraints on a low-power Hadoop cluster. In: 2014 International Computer Science and Engineering Conference (ICSEC), pp. 267–272. IEEE (2014)
Aroca, R.V., Gonçalves, L.M.G.: Towards green data centers: a comparison of x86 and ARM architectures power efficiency. J. Parallel Distrib. Comput. 72(12), 1770–1780 (2012)
Klausecker, C., Kranzlmüller, D., Fürlinger, K.: Towards energy efficient parallel computing on consumer electronic devices. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 1–9. Springer, Heidelberg (2011)
Fürlinger, K., Klausecker, C., Kranzlmüller, D.: The AppleTV-cluster: towards energy efficient parallel computing on consumer electronic devices. Whitepaper, Ludwig-Maximilians-Universitat (2011)
Rajovic, N., Vilanova, L., Villavieja, C., et al.: The low power architecture approach towards exascale computing. J. Comput. Sci. 4(6), 439–443 (2013)
Dumitrel Loghin, B.M.T., Zhang, H., Ooi, B.C., et al.: A performance study of big data on small nodes. Proc. VLDB Endow. 8(7), 762–773 (2015)
Gu, L., Zeng, D., Guo, S., Yong, X., Hu, J.: A general communication cost optimization framework for big data stream processing in geo-distributed data center. IEEE Trans. Comput. (ToC) (2015)
Lin, G., Zeng, D., Li, P., Guo, S.: Cost minimization for big data processing in geo-distributed data centers. IEEE Trans. Emerg. Topics Comput. 2(3), 314–323 (2014)
Hu, C., Zhao, J., Yan, X., Zeng, D., Guo, S.: A MapReduce based parallel niche genetic algorithm for contaminant source identification in water distribution network. Ad Hoc Netw. 35, 116–126 (2015)
Gu, L., Zeng, D., Guo, S., Barnawi, A., Stojmenovic, I.: Optimal task placement with QoS constraints in geo-distributed data centers using DVFS. IEEE Trans. Comput. (ToC) 64(7), 2049–2059 (2014)
Plugaru, V., Varrette, S., Pinel, F., et al.: Evaluating the HPC performance and energy-efficiency of Intel and ARM-based systems with synthetic and bioinformatics workloads. In: CSC (2014)
McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns For Efficient Computation. Elsevier, Waltham (2012)
Chou, C.-Y., Chang, Hsi-Ya., Wang, S.-T., Tcheng, S.-C.: Modeling message-passing overhead on NCHC formosa PC cluster. In: Chung, Y.-C., Moreira, J.E. (eds.) GPC 2006. LNCS, vol. 3947, pp. 299–307. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Fan, X. et al. (2016). An ARM-Based Hadoop Performance Evaluation Platform: Design and Implementation. In: Guo, S., Liao, X., Liu, F., Zhu, Y. (eds) Collaborative Computing: Networking, Applications, and Worksharing. CollaborateCom 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 163. Springer, Cham. https://doi.org/10.1007/978-3-319-28910-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-28910-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28909-0
Online ISBN: 978-3-319-28910-6
eBook Packages: Computer ScienceComputer Science (R0)