A Comparison of ARM Against x86 for Distributed Machine Learning Workloads

Kmiec, Sebastian; Wong, Jonathon; Jacobsen, Hans-Arno; Ren, Da Qi

doi:10.1007/978-3-319-72401-0_12

Sebastian Kmiec¹⁵,
Jonathon Wong¹⁵,
Hans-Arno Jacobsen¹⁵ &
…
Da Qi Ren¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10661))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1058 Accesses

Abstract

The rise of Machine Learning (ML) in the last decade has created an unprecedented surge in demand for new and more powerful hardware. Various hardware approaches exist to take on these large demands motivating the need for hardware performance benchmarks to compare these diverse hardware systems. In this paper, we present a comprehensive analysis and comparison of available benchmark suites in the field of ML and related fields. The analysis of these benchmarks is used to discuss the potential of ARM processors within the context of ML deployments. Our paper concludes with a brief hardware performance comparison of modern, server-grade ARM and x86 processors using a benchmark suite selected from our survey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 60.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adolf, R., Rama, S., Reagen, B., Wei, G., Brooks, D.M.: Fathom: reference workloads for modern deep learning methods. CoRR abs/1608.06581 (2016). http://arxiv.org/abs/1608.06581
The Linley Group Analyst: Thunderx rattles server market. Technical report, The Linley Group, June 2014. http://www.cavium.com/pdfFiles/ThunderX_Rattles_Server_Market.pdf
Aroca, R.V., Gonçalves, L.M.G.: Towards green data centers: a comparison of x86 and ARM architectures power efficiency. J. Parallel Distrib. Comput. 72(12), 1770–1780 (2012). http://www.sciencedirect.com/science/article/pii/S0743731512002122
Article Google Scholar
Barroso, L.A., Hölzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)
Article Google Scholar
Bienia, C.: Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University, Princeton (2011). aAI3445564
Google Scholar
Blem, E., Menon, J., Sankaralingam, K.: Power struggles: revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 1–12, February 2013
Google Scholar
Cavium: ThunderX\(^{\textregistered }\) ARM processors. http://www.cavium.com/ThunderX_ARM_Processors.html
Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC 2010), pp. 1–11. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/IISWC.2010.5650274
Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), pp. 63–74. ACM, New York (2010). http://doi.acm.org/10.1145/1735688.1735702
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. CoRR abs/1604.06778 (2016). http://arxiv.org/abs/1604.06778
Feng, W.C., Lin, H., Scogland, T., Zhang, J.: OpenCL and the 13 dwarfs: a work in progress. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012), pp. 291–294. ACM, New York (2012). http://doi.acm.org/10.1145/2188286.2188341
Geiger, A.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 3354–3361. IEEE Computer Society, Washington, DC (2012). http://dl.acm.org/citation.cfm?id=2354409.2354978
George, A.D.: An overview of RISC vs. CISC. In: Proceedings of the Twenty-Second Southeastern Symposium on System Theory, pp. 436–438, March 1990
Google Scholar
Hauswald, J., Kang, Y., Laurenzano, M.A., Chen, Q., Li, C., Mudge, T., Dreslinski, R.G., Mars, J., Tang, L.: Djinn and Tonic: DNN as a service and its implications for future warehouse scale computers. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015), pp. 27–40. ACM, New York (2015). http://doi.acm.org/10.1145/2749469.2749472
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)
Google Scholar
Intel: Intel\(\textregistered \) Xeon\(\textregistered \) processor e5-2620 v4. https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz
Jundt, A., Cauble-Chantrenne, A., Tiwari, A., Peraza, J., Laurenzano, M.A., Carrington, L.: Compute bottlenecks on the new 64-bit ARM. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC 2015), pp. 6:1–6:7. ACM, New York (2015). http://doi.acm.org/10.1145/2834800.2834806
Kalyanasundaram, J., Simmhan, Y.: ARM wrestling with big data: a study of ARM64 and x64 servers for data intensive workloads. CoRR abs/1701.05996 (2017). http://arxiv.org/abs/1701.05996
Laurenzano, M.A., Tiwari, A., Jundt, A., Peraza, J., Ward, W.A., Campbell, R., Carrington, L.: Characterizing the performance-energy tradeoff of small ARM cores in HPC computation. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 124–137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_11
Google Scholar
Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: A comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers (CF 2015), pp. 53:1–53:8. ACM, New York (2015). http://doi.acm.org/10.1145/2742854.2747283
Morgan, T.P.: Intel lines up ThunderX ARM against Xeons. Technical report, The Next Platform, May 2016. https://www.nextplatform.com/2016/05/31/intel-lines-thunderx-arms-xeons/
Saponara, S., Fanucci, L., Coppola, M.: Many-core platform with NoC interconnect for low cost and energy sustainable cloud server-on-chip. In: 2012 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–5, October 2012
Google Scholar
Svanfeldt-Winter, O., Lafond, S., Lilius, J.: Cost and energy reduction evaluation for ARM based web servers. In: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC 2011), pp. 480–487. IEEE Computer Society, Washington, DC (2011). http://dx.doi.org/10.1109/DASC.2011.93
Venkata, S.K., Ahn, I., Jeon, D., Gupta, A., Louie, C., Garcia, S., Belongie, S., Taylor, M.B.: SD-VBS: The San Diego vision benchmark suite. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 55–64. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/IISWC.2009.5306794
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a Big Data benchmark suite from internet services. CoRR abs/1401.1406 (2014). http://arxiv.org/abs/1401.1406

Download references

Acknowledgements

This work was supported by the Data Center Technology Lab of Huawei.

Author information

Authors and Affiliations

University of Toronto, Toronto, ON, Canada
Sebastian Kmiec, Jonathon Wong & Hans-Arno Jacobsen
Futurewei Technologies, Santa Clara, CA, USA
Da Qi Ren

Authors

Sebastian Kmiec
View author publications
You can also search for this author in PubMed Google Scholar
Jonathon Wong
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Arno Jacobsen
View author publications
You can also search for this author in PubMed Google Scholar
Da Qi Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans-Arno Jacobsen .

Editor information

Editors and Affiliations

Cisco Systems, Inc., San Jose, California, USA
Raghunath Nambiar
Server Technologies, Oracle Corporation, Redwood Shores, California, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kmiec, S., Wong, J., Jacobsen, HA., Ren, D.Q. (2018). A Comparison of ARM Against x86 for Distributed Machine Learning Workloads. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Analytics Era. TPCTC 2017. Lecture Notes in Computer Science(), vol 10661. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-72401-0_12
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72400-3
Online ISBN: 978-3-319-72401-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics