A Survey on Deep Learning Benchmarks: Do We Still Need New Ones?

Zhang, Qin; Zha, Li; Lin, Jian; Tu, Dandan; Li, Mingzhe; Liang, Fan; Wu, Ren; Lu, Xiaoyi

doi:10.1007/978-3-030-32813-9_5

Qin Zhang^10,11,
Li Zha¹⁰,
Jian Lin¹²,
Dandan Tu¹²,
Mingzhe Li¹³,
Fan Liang¹⁴,
Ren Wu¹⁵ &
…
Xiaoyi Lu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11459))

Included in the following conference series:

International Symposium on Benchmarking, Measuring and Optimization

1458 Accesses
5 Citations

Abstract

Deep Learning has recently been gaining popularity. From the micro-architecture field to the upper-layer end applications, a lot of research work has been proposed in the literature to advance the knowledge of Deep Learning. Deep Learning Benchmarking is one of such hot spots in the community. There are a bunch of Deep Learning benchmarks available in the community already and new ones keep coming as well. However, we find that not many survey works are available to give an overview of these useful benchmarks in the literature. We also find few discussions on what has been done for Deep Leaning Benchmarking in the community and what are still missing. To fill this gap, this paper attempts to provide a survey on multiple high-impact Deep Learning Benchmarks with training and inference support. We share some of our insightful observations and discussions on these benchmarks. In this paper, we believe the community still needs more benchmarks to capture different perspectives, while these benchmarks need a way for converging to a standard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2016), pp. 265–283 (2016)
Google Scholar
Adolf, R., Rama, S., Reagen, B., Wei, G.-Y., Brooks, D.M.: Fathom: reference workloads for modern deep learning methods. CoRR, abs/1608.06581 (2016)
Google Scholar
Akioka, S., Muraoka, Y.: HPC benchmarks on Amazon EC2. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 1029–1034. IEEE (2010)
Google Scholar
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)
Google Scholar
Ben-Nun, T., Besta, M., Huber, S., Ziogas, A.N., Peter, D., Hoefler, T.: A modular benchmarking infrastructure for high-performance and reproducible deep learning. arXiv:1901.10183 (2019)
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
BigDataBench: A Big Data and AI Benchmark Suite (2018). http://prof.ict.ac.cn/
Huang, C., et al.: AIoT bench: towards comprehensive benchmarking mobile and embedded device intelligence. In: BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench), Seattle, WA, USA (2018)
Google Scholar
Chen, T., et al.: BenchNN: on the broad potential application scope of hardware neural network accelerators. In: Proceedings - 2012 IEEE International Symposium on Workload Characterization, IISWC 2012, pp. 36–45, November 2012
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Coleman, C., et al.: DAWNBench: an end-to-end deep learning benchmark and competition. In: Proceedings of ML Systems Workshop, Co-Located with 31st Conference on Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Stanford DAWNBench: An End-to-End Deep Learning Benchmark and Competition (2018). https://dawn.cs.stanford.edu/benchmark/
Baidu DeepBench: Benchmarking Deep Learning Operations on Different Hardware (2018). https://github.com/baidu-research/DeepBench
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Sign. Process. Mag. 29(6), 141–142 (2012)
Article Google Scholar
Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Technical report UT-EECS-15-736, Electrical Engineering and Computer Science Department, University of Tennessee (2015)
Google Scholar
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK benchmark: past, present and future. Concurrency Comput.: Pract. Exp. 15(9), 803–820 (2003)
Article Google Scholar
Druzhkov, P.N., Kustikova, V.D.: A survey of deep learning methods and software tools for image classification and object detection. Pattern Recogn. Image Anal. 26(1), 9–15 (2016)
Article Google Scholar
Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T., Philbrick, K.: Toolkits and libraries for deep learning. J. Dig. Imaging 30(4), 400–405 (2017)
Article Google Scholar
Facebook AI Performance Evaluation Platform (2018). https://github.com/facebook/FAI-PEP
Gao, W., et al.: BigDataBench: a dwarf-based big data and AI benchmark suite, pp. 1–23 (2018)
Google Scholar
Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Hatcher, W.G., Yu, W.: A survey of deep learning: platforms, applications and emerging research trends. IEEE Access 6, 24411–24432 (2018)
Article Google Scholar
Hauswald, J., et al.: DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ISCA 2015. ACM, New York (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_32
Chapter Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
Facebook Inc.: Caffe2. https://caffe2.ai/
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Ketkar, N.: Introduction to PyTorch. Deep Learning with Python, pp. 195–208. Apress, Berkeley (2017). https://doi.org/10.1007/978-1-4842-2766-4_12
Chapter Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)
Article Google Scholar
Lee, J., et al.: On-device augmented reality with mobile GPUs (2019). https://mixedreality.cs.cornell.edu/s/10_CV4ARVR2019-jet-camera-ready.pdf
Luszczek, P.R., et al.: The HPC challenge (HPCC) benchmark suite. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, vol. 213. Citeseer (2006)
Google Scholar
MLPerf: A Broad ML Benchmark Suite for Measuring Performance of ML Software Frameworks, ML Hardware Accelerators, and ML Cloud Platforms (2018). https://mlperf.org/
Nelson, M.T., et al.: NAMD: a parallel, object-oriented molecular dynamics program. Int. J. Supercomput. Appl. High Perform. Comput. 10(4), 251–268 (1996)
Google Scholar
Ota, K., Dao, M.S., Mezaris, V., De Natale, F.G.B.: Deep learning for mobile multimedia: a survey. ACM Trans. Multimed. Comput. Commun. Appl. 13(3s), 34:1–34:22 (2017)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Rampasek, L., Goldenberg, A.: TensorFlow: biology’s gateway to deep learning? Cell Syst. 2(1), 12–14 (2016)
Article Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Wen, X., et al.: EdgeAI bench: towards comprehensive end-to-end edge computing benchmarking. In: BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench), Seattle, WA, USA (2018)
Google Scholar
TensorFlow Benchmarks (2018). https://www.tensorflow.org/performance/benchmarks
Thomas, S., et al.: CortexSuite: a synthetic brain benchmark suite. In: 2014 IEEE International Symposium on Workload Characterization (IISWC), pp. 76–79, October 2014
Google Scholar
Wang, L., et al.: HPC AI500: a benchmark suite for HPC AI systems. In: BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench), Seattle, WA, USA (2018)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar

Download references

Acknowledgments

This research is supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No. XDA19020400.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Qin Zhang & Li Zha
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Qin Zhang
Huawei Technologies Co., Ltd., Shenzhen, China
Jian Lin & Dandan Tu
Facebook, Inc., Menlo Park, USA
Mingzhe Li
Cambricon, Inc., Beijing, China
Fan Liang
NovuMind, Inc., Santa Clara, USA
Ren Wu
Department of Computer Science and Engineering, The Ohio State University, Columbus, USA
Xiaoyi Lu

Authors

Qin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zha
View author publications
You can also search for this author in PubMed Google Scholar
Jian Lin
View author publications
You can also search for this author in PubMed Google Scholar
Dandan Tu
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Fan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ren Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyi Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qin Zhang .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Chen Zheng
Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Q. et al. (2019). A Survey on Deep Learning Benchmarks: Do We Still Need New Ones?. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-32813-9_5
Published: 08 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32812-2
Online ISBN: 978-3-030-32813-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics