Abstract
Tensorflow (TF) is a highly popular Deep Learning (DL) software framework. Neural network training, a critical part of DL workflow, is a computationally intensive process that can take days or even weeks. Therefore, achieving faster training times is an active area of research and practise. TF supports multiple GPU parallelization, both within a single machine and between multiple physical servers. However, the distributed case is hard to use and consequently, almost all published performance data comes from the single machine use case. To fill this gap, here we benchmark Tensorflow in a GPU-equipped distributed environment. Our work evaluates performance of various hardware and software combinations. In particular, we examine several types of interconnect technologies to determine their impact on performance. Our results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)
Paszke, A., et al.: Automatic differentiation in PyTorch. https://openreview.net/forum?id=BJJsrmfCZ
Imagenet. http://image-net.org/about-stats
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)
Nvidia Corporation. https://www.nvidia.com/en-us/data-center/dgx-1/
Nvidia Corporation. https://www.nvidia.com/en-us/data-center/dgx-2/
Nvidia Corporation. https://www.nvidia.com/en-us/data-center/nvlink/
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
You, Y., Zhang, Z., Hsieh, C., Demmel, J., Keutzer, K.: ImageNet training in minutes. CoRR, abs/1709.05011 (2017)
Cho, M., Finkler, U., Kumar, S., Kung, D., Saxena, V., Sreedhar, D.: PowerAI DDL. arXiv preprint arXiv:1708.02188 (2017)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)
Lenovo SD530. https://lenovopress.com/lp0635-thinksystem-sd530-server
Tensorflow Performance Guide. https://www.tensorflow.org/performance/performance_guide
Tensorflow P100 Benchmarks. https://www.tensorflow.org/performance/benchmarks#results
Mishkin, D., Sergievskiy, N., Matas, J.: Systematic evaluation of convolution neural network advances on the Imagenet. Comput. Vis. Image Underst. 161, 11–19 (2017)
NVIDIA DGX-1 With Tesla V100 System Architecture. http://images.nvidia.com/content/pdf/dgx1-v100-system-architecture-whitepaper.pdf
MLPerf. https://www.mlperf.org/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hodak, M., Dholakia, A. (2019). Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Era of Artificial Intelligence. TPCTC 2018. Lecture Notes in Computer Science(), vol 11135. Springer, Cham. https://doi.org/10.1007/978-3-030-11404-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-11404-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11403-9
Online ISBN: 978-3-030-11404-6
eBook Packages: Computer ScienceComputer Science (R0)