Abstract
The rise of Machine Learning (ML) in the last decade has created an unprecedented surge in demand for new and more powerful hardware. Various hardware approaches exist to take on these large demands motivating the need for hardware performance benchmarks to compare these diverse hardware systems. In this paper, we present a comprehensive analysis and comparison of available benchmark suites in the field of ML and related fields. The analysis of these benchmarks is used to discuss the potential of ARM processors within the context of ML deployments. Our paper concludes with a brief hardware performance comparison of modern, server-grade ARM and x86 processors using a benchmark suite selected from our survey.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adolf, R., Rama, S., Reagen, B., Wei, G., Brooks, D.M.: Fathom: reference workloads for modern deep learning methods. CoRR abs/1608.06581 (2016). http://arxiv.org/abs/1608.06581
The Linley Group Analyst: Thunderx rattles server market. Technical report, The Linley Group, June 2014. http://www.cavium.com/pdfFiles/ThunderX_Rattles_Server_Market.pdf
Aroca, R.V., Gonçalves, L.M.G.: Towards green data centers: a comparison of x86 and ARM architectures power efficiency. J. Parallel Distrib. Comput. 72(12), 1770–1780 (2012). http://www.sciencedirect.com/science/article/pii/S0743731512002122
Barroso, L.A., Hölzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)
Bienia, C.: Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University, Princeton (2011). aAI3445564
Blem, E., Menon, J., Sankaralingam, K.: Power struggles: revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 1–12, February 2013
Cavium: ThunderX\(^{\textregistered }\) ARM processors. http://www.cavium.com/ThunderX_ARM_Processors.html
Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC 2010), pp. 1–11. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/IISWC.2010.5650274
Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), pp. 63–74. ACM, New York (2010). http://doi.acm.org/10.1145/1735688.1735702
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. CoRR abs/1604.06778 (2016). http://arxiv.org/abs/1604.06778
Feng, W.C., Lin, H., Scogland, T., Zhang, J.: OpenCL and the 13 dwarfs: a work in progress. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012), pp. 291–294. ACM, New York (2012). http://doi.acm.org/10.1145/2188286.2188341
Geiger, A.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 3354–3361. IEEE Computer Society, Washington, DC (2012). http://dl.acm.org/citation.cfm?id=2354409.2354978
George, A.D.: An overview of RISC vs. CISC. In: Proceedings of the Twenty-Second Southeastern Symposium on System Theory, pp. 436–438, March 1990
Hauswald, J., Kang, Y., Laurenzano, M.A., Chen, Q., Li, C., Mudge, T., Dreslinski, R.G., Mars, J., Tang, L.: Djinn and Tonic: DNN as a service and its implications for future warehouse scale computers. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015), pp. 27–40. ACM, New York (2015). http://doi.acm.org/10.1145/2749469.2749472
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)
Intel: Intel\(\textregistered \) Xeon\(\textregistered \) processor e5-2620 v4. https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz
Jundt, A., Cauble-Chantrenne, A., Tiwari, A., Peraza, J., Laurenzano, M.A., Carrington, L.: Compute bottlenecks on the new 64-bit ARM. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC 2015), pp. 6:1–6:7. ACM, New York (2015). http://doi.acm.org/10.1145/2834800.2834806
Kalyanasundaram, J., Simmhan, Y.: ARM wrestling with big data: a study of ARM64 and x64 servers for data intensive workloads. CoRR abs/1701.05996 (2017). http://arxiv.org/abs/1701.05996
Laurenzano, M.A., Tiwari, A., Jundt, A., Peraza, J., Ward, W.A., Campbell, R., Carrington, L.: Characterizing the performance-energy tradeoff of small ARM cores in HPC computation. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 124–137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_11
Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: A comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers (CF 2015), pp. 53:1–53:8. ACM, New York (2015). http://doi.acm.org/10.1145/2742854.2747283
Morgan, T.P.: Intel lines up ThunderX ARM against Xeons. Technical report, The Next Platform, May 2016. https://www.nextplatform.com/2016/05/31/intel-lines-thunderx-arms-xeons/
Saponara, S., Fanucci, L., Coppola, M.: Many-core platform with NoC interconnect for low cost and energy sustainable cloud server-on-chip. In: 2012 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–5, October 2012
Svanfeldt-Winter, O., Lafond, S., Lilius, J.: Cost and energy reduction evaluation for ARM based web servers. In: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC 2011), pp. 480–487. IEEE Computer Society, Washington, DC (2011). http://dx.doi.org/10.1109/DASC.2011.93
Venkata, S.K., Ahn, I., Jeon, D., Gupta, A., Louie, C., Garcia, S., Belongie, S., Taylor, M.B.: SD-VBS: The San Diego vision benchmark suite. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 55–64. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/IISWC.2009.5306794
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a Big Data benchmark suite from internet services. CoRR abs/1401.1406 (2014). http://arxiv.org/abs/1401.1406
Acknowledgements
This work was supported by the Data Center Technology Lab of Huawei.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Kmiec, S., Wong, J., Jacobsen, HA., Ren, D.Q. (2018). A Comparison of ARM Against x86 for Distributed Machine Learning Workloads. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Analytics Era. TPCTC 2017. Lecture Notes in Computer Science(), vol 10661. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-72401-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72400-3
Online ISBN: 978-3-319-72401-0
eBook Packages: Computer ScienceComputer Science (R0)