Skip to main content

A Comparison of ARM Against x86 for Distributed Machine Learning Workloads

  • Conference paper
  • First Online:
Performance Evaluation and Benchmarking for the Analytics Era (TPCTC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10661))

Included in the following conference series:

  • 1058 Accesses

Abstract

The rise of Machine Learning (ML) in the last decade has created an unprecedented surge in demand for new and more powerful hardware. Various hardware approaches exist to take on these large demands motivating the need for hardware performance benchmarks to compare these diverse hardware systems. In this paper, we present a comprehensive analysis and comparison of available benchmark suites in the field of ML and related fields. The analysis of these benchmarks is used to discuss the potential of ARM processors within the context of ML deployments. Our paper concludes with a brief hardware performance comparison of modern, server-grade ARM and x86 processors using a benchmark suite selected from our survey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adolf, R., Rama, S., Reagen, B., Wei, G., Brooks, D.M.: Fathom: reference workloads for modern deep learning methods. CoRR abs/1608.06581 (2016). http://arxiv.org/abs/1608.06581

  2. The Linley Group Analyst: Thunderx rattles server market. Technical report, The Linley Group, June 2014. http://www.cavium.com/pdfFiles/ThunderX_Rattles_Server_Market.pdf

  3. Aroca, R.V., Gonçalves, L.M.G.: Towards green data centers: a comparison of x86 and ARM architectures power efficiency. J. Parallel Distrib. Comput. 72(12), 1770–1780 (2012). http://www.sciencedirect.com/science/article/pii/S0743731512002122

    Article  Google Scholar 

  4. Barroso, L.A., Hölzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)

    Article  Google Scholar 

  5. Bienia, C.: Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University, Princeton (2011). aAI3445564

    Google Scholar 

  6. Blem, E., Menon, J., Sankaralingam, K.: Power struggles: revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 1–12, February 2013

    Google Scholar 

  7. Cavium: ThunderX\(^{\textregistered }\) ARM processors. http://www.cavium.com/ThunderX_ARM_Processors.html

  8. Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC 2010), pp. 1–11. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/IISWC.2010.5650274

  9. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), pp. 63–74. ACM, New York (2010). http://doi.acm.org/10.1145/1735688.1735702

  10. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. CoRR abs/1604.06778 (2016). http://arxiv.org/abs/1604.06778

  11. Feng, W.C., Lin, H., Scogland, T., Zhang, J.: OpenCL and the 13 dwarfs: a work in progress. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering (ICPE 2012), pp. 291–294. ACM, New York (2012). http://doi.acm.org/10.1145/2188286.2188341

  12. Geiger, A.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 3354–3361. IEEE Computer Society, Washington, DC (2012). http://dl.acm.org/citation.cfm?id=2354409.2354978

  13. George, A.D.: An overview of RISC vs. CISC. In: Proceedings of the Twenty-Second Southeastern Symposium on System Theory, pp. 436–438, March 1990

    Google Scholar 

  14. Hauswald, J., Kang, Y., Laurenzano, M.A., Chen, Q., Li, C., Mudge, T., Dreslinski, R.G., Mars, J., Tang, L.: Djinn and Tonic: DNN as a service and its implications for future warehouse scale computers. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA 2015), pp. 27–40. ACM, New York (2015). http://doi.acm.org/10.1145/2749469.2749472

  15. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)

    Google Scholar 

  16. Intel: Intel\(\textregistered \) Xeon\(\textregistered \) processor e5-2620 v4. https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz

  17. Jundt, A., Cauble-Chantrenne, A., Tiwari, A., Peraza, J., Laurenzano, M.A., Carrington, L.: Compute bottlenecks on the new 64-bit ARM. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC 2015), pp. 6:1–6:7. ACM, New York (2015). http://doi.acm.org/10.1145/2834800.2834806

  18. Kalyanasundaram, J., Simmhan, Y.: ARM wrestling with big data: a study of ARM64 and x64 servers for data intensive workloads. CoRR abs/1701.05996 (2017). http://arxiv.org/abs/1701.05996

  19. Laurenzano, M.A., Tiwari, A., Jundt, A., Peraza, J., Ward, W.A., Campbell, R., Carrington, L.: Characterizing the performance-energy tradeoff of small ARM cores in HPC computation. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 124–137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_11

    Google Scholar 

  20. Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: A comprehensive benchmarking suite for in memory data analytic platform Spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers (CF 2015), pp. 53:1–53:8. ACM, New York (2015). http://doi.acm.org/10.1145/2742854.2747283

  21. Morgan, T.P.: Intel lines up ThunderX ARM against Xeons. Technical report, The Next Platform, May 2016. https://www.nextplatform.com/2016/05/31/intel-lines-thunderx-arms-xeons/

  22. Saponara, S., Fanucci, L., Coppola, M.: Many-core platform with NoC interconnect for low cost and energy sustainable cloud server-on-chip. In: 2012 Sustainable Internet and ICT for Sustainability (SustainIT), pp. 1–5, October 2012

    Google Scholar 

  23. Svanfeldt-Winter, O., Lafond, S., Lilius, J.: Cost and energy reduction evaluation for ARM based web servers. In: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC 2011), pp. 480–487. IEEE Computer Society, Washington, DC (2011). http://dx.doi.org/10.1109/DASC.2011.93

  24. Venkata, S.K., Ahn, I., Jeon, D., Gupta, A., Louie, C., Garcia, S., Belongie, S., Taylor, M.B.: SD-VBS: The San Diego vision benchmark suite. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC 2009), pp. 55–64. IEEE Computer Society, Washington, DC (2009). http://dx.doi.org/10.1109/IISWC.2009.5306794

  25. Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a Big Data benchmark suite from internet services. CoRR abs/1401.1406 (2014). http://arxiv.org/abs/1401.1406

Download references

Acknowledgements

This work was supported by the Data Center Technology Lab of Huawei.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans-Arno Jacobsen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kmiec, S., Wong, J., Jacobsen, HA., Ren, D.Q. (2018). A Comparison of ARM Against x86 for Distributed Machine Learning Workloads. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Analytics Era. TPCTC 2017. Lecture Notes in Computer Science(), vol 10661. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72401-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72400-3

  • Online ISBN: 978-3-319-72401-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics