BenchIP: Benchmarking Intelligence Processors

Tao, Jin-Hua; Du, Zi-Dong; Guo, Qi; Lan, Hui-Ying; Zhang, Lei; Zhou, Sheng-Yuan; Xu, Ling-Jie; Liu, Cong; Liu, Hai-Feng; Tang, Shan; Rush, Allen; Chen, Willian; Liu, Shao-Li; Chen, Yun-Ji; Chen, Tian-Shi

doi:10.1007/s11390-018-1805-8

BenchIP: Benchmarking Intelligence Processors

Regular Paper
Published: 26 January 2018

Volume 33, pages 1–23, (2018)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jin-Hua Tao^1,2,3,
Zi-Dong Du^1,3,4,
Qi Guo^1,3,4,
Hui-Ying Lan^1,3,
Lei Zhang^1,3,
Sheng-Yuan Zhou^1,3,
Ling-Jie Xu⁵,
Cong Liu⁶,
Hai-Feng Liu⁷,
Shan Tang⁸,
Allen Rush⁹,
Willian Chen⁹,
Shao-Li Liu^1,3,4,
Yun-Ji Chen^1,2,3 &
…
Tian-Shi Chen^1,3,4

359 Accesses
15 Citations
Explore all metrics

Abstract

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BenchIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BenchIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors. BenchIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BenchIP will be open-sourced soon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84-90.
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. http://arxiv.org/abs/1409.1556, Dec. 2017.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. arXiv: 1512.03385, 2015. http://arxiv.org/abs/1512.03385, Dec. 2017.
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K. Sequence to sequence-video to text. In Proc. the Int. Conf. Computer Vision, Dec. 2015, pp.4534-4542.
Abdel-Hamid O, Mohamed A R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. In Proc. IEEE/ACM Trans. Audio Speech and Language Processing, July 2014, pp.1533-1545.
Eriguchi A, Hashimoto K, Tsuruok Y. Tree-to-sequence attentional neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016, pp.823-833.
Farabet C, Poulet C, Han J Y, LeCun Y. CNP: An FPGA-based processor for convolutional networks. In Proc. Int. Conf. Field Programmable Logic and Applications, Aug. 31-Sept. 2, 2009, pp.32-37.
Zhang C, Li P, Sun G Y, Guan Y J, Xiao B J, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. the ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Feb. 2015, pp.161-170.
Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2014, pp.269-284.
Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, June 2011, pp.109-116.
Han S, Liu X Y, Mao H Z, Pu J, Pedram A, Horowitz M A, Dally W J. EIE: Efficient inference engine on compressed deep neural network. In Proc. the 43rd Int. Symp. Computer Architecture, June 2016, pp.243-254.
Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.
Alwani M, Chen H, Ferdman M, Milder P. Fused-layer CNN accelerators. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.
Judd P, Albericio J, Hetherington T, Aamodt T M, Moshovos A. Stripes: Bit-serial deep neural network computing. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.
Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S W. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.
Zhang S J, Du Z D, Zhang L, Lan H Y, Liu S L, Li L, Guo Q, Chen T, Chen Y J. Cambricon-x: An accelerator for sparse neural networks. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016.
Ji Y, Zhang Y H, Li S C, Chi P, Jiang C H, Qu P, Xie Y, Chen W G. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016.
Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.380-392.
LiKamWa R, Hou Y H, Gao Y, Polansky M, Zhong L. Red-Eye: Analog convNet image sensor architecture for continuous mobile vision. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.255-266.
Albericio J, Judd P, Hetherington T, Aamodt T, Jerger N E, Moshovos A. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016.
Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.27-39.
Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan J P, Hu M, Williams R S, Srikumar V. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.14-26.
Liu S L, Du Z D, Tao J H, Han D, Luo T, Xie Y, Chen Y J, Chen T S. Cambricon: An instruction set architecture for neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.393-405.
Chen Y H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.367-379.
Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee S K, Hernández-Lobato J M, Wei G Y, Brooks D. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.267-278.
Song L H, Qian X H, Li H, Chen Y R. PipeLayer: A pipelined reRAM-based accelerator for deep learning. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.541-552.
Lu W Y, Yan G H, Li J J, Gong S J, Han Y H, Li X W. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.553-564.
Song M C, Hu Y, Chen H X, Li T. Towards pervasive and user satisfactory CNN across GPU microarchitectures. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017.
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv: 1603.05279, 2016. http://arxiv.org/abs/1603.05279, Dec. 2017.
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. arXiv: 1506.02626, 2015. http://arxiv.org/abs/1506.02626, Dec. 2017.
Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 28th Int. Conf. Neural Information Processing Systems, Dec. 2015, pp.91-99.
Parkhi O M, Vedaldi A, Zisserman A. Deep face recognition. In Proc. the British Machine Vision Conf., September 2015, pp.41:1-41:12.
Johnson J, Karpathy A, Li F F. DenseCap: Fully convolutional localization networks for dense captioning. arXiv: 1511.07571, 2015. http://arxiv.org/abs/1511.07571, Dec. 2017.
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1520-1528.
Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.6645-6649.
Andor D, Alberti C, Weiss D, Severyn A, Presta A, Ganchev K, Petrov S, Collins M. Globally normalized transition-based neural networks. arXiv: 1603.06042, 2016. http://arxiv.org/abs/1603.06042, Dec. 2017.
Chen Y J, Luo T, Liu S L, Zhang S J, He L Q, Wang J, Li L, Chen T S, Xu Z W, Sun N H, Temam O. DaDianNao: A machine-learning supercomputer. In Proc. the 47th Annual IEEE/ACM Int. Symp. Microarchitecture, Dec. 2014, pp.609-622.
Du Z D, Fasthuber R, Chen T S, Ienne P, Li L, Feng X B, Chen Y J, Temam O. ShiDianNao: Shifting vision processing closer to the sensor. In Proc. the 42nd Annual Int. Symp. Computer Architecture, June 2015, pp.92-104.
Chen T S, Chen Y J, Duranton M, Guo Q, Hashmi A, Lipasti M, Nere A, Qiu S, Sebag M, Temam O. BenchNN: On the broad potential application scope of hardware neural network accelerators. In Proc. IEEE Int. Symp. Workload Characterization, Nov. 2012, pp.36-45.
Adolf R, Rama S, Reagen B, Wei G Y, Brooks D. Fathom: Reference workloads for modern deep learning methods. In Proc. IEEE Int. Symp. Workload Characterization, Sept. 2016.
Murtagh F, Hernández-Pajares M. The Kohonen selforganizing map method: An assessment. Journal of Classification, 1995, 12(2): 165-190.
Article MATH Google Scholar
Jia Y Q, Shelhamer E, Donahue J et al. Caffe: Convolutional architecture for fast feature embedding. arXiv: 1408.5093, 2014. http://arxiv.org/abs/1408.5093, Dec. 2017.
Chen T Q, Li M, Li Y T et al. MXNET: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv: 1512.01274, 2015. http://arxiv.org/abs/1512.01274, Dec. 2017.
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
Article Google Scholar
Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3128-3137.
He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1026-1034.
Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.1701-1708.
Le Q V. Building high-level features using large scale unsupervised learning. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.8595-8598.
Jouppi N P, Young C, Patil N et al. In-datacenter performance analysis of a tensor processing unit. arXiv: 1704.04760, 2017. http://arxiv.org/abs/1704.04760, Dec. 2017.
Phansalkar A, Joshi A, John L K. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In Proc. the 34th Annual Int. Symp. Computer Architecture, June 2007, pp.412-423.
McCalpin J D. Memory bandwidth and machine balance in current high performance computers. In Proc. the IEEE Computer Society Technical Committee on Computer Architecture, Dec. 1995, pp.19-25.
Bull J M, O’Neill D. A microbenchmark suite for OpenMP 2.0. ACM SIGARCH Computer Architecture News, 2001, 29(5): 41-48.
Article Google Scholar
Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks. In Proc. the 31st Int. Conf. Machine Learning, June 2014, pp.1764-1772.
Marcus M P, Santorini B, Marcinkiewicz M A. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 1993, 19(2): 313-330.
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.
Article MathSciNet Google Scholar
Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338.
Article Google Scholar
Huang G B, Ramesh M, Berg T, Learned-Miller E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, Amherst: University of Massachusetts, 2007. http://viswww.cs.umass.edu/lfw/, Dec. 2017.
Chen D L, Dolan W B. Collecting highly parallel data for paraphrase evaluation. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, pp.190-200.
Mucci P J, Browne S, Deane C, Ho G. PAPI: A portable interface to hardware performance counters. In Proc. Department of Defense HPCMP Users Group Conf., June 1999, pp.7-10.
Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation, June 2003, pp.245-257.
Pawlowski J T. Hybrid memory cube: Breakthrough dram performance with a fundamentally re-architected dram subsystem. In Proc. the 23rd Hot Chips Symp., August 2011.
Courbariaux M, Bengio Y. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv: 1602.02830, 2016. http://arxiv.org/abs/1602.02830v1, Dec. 2017.
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv: 1603.05279, 2016. http://arxiv.org/abs/1603.05279, Dec. 2017.
Denkowski M, Lavie A. Meteor universal: Language specific translation evaluation for any target language. In Proc. the 9th Workshop on Statistical Machine Translation, June 2014, pp.376-380.
Keckler SW, DallyW J, Khailany B, Garland M, Glasco D. GPUs and the future of parallel computing. IEEE Micro, 2011, 31(5): 7-17.
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology Chinese Academy of Sciences, Beijing, 100190, China
Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Shao-Li Liu, Yun-Ji Chen & Tian-Shi Chen
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China
Jin-Hua Tao & Yun-Ji Chen
Intelligent Processor Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Shao-Li Liu, Yun-Ji Chen & Tian-Shi Chen
Cambricon Ltd., Beijing, 100190, China
Zi-Dong Du, Qi Guo, Shao-Li Liu & Tian-Shi Chen
Alibaba Infrastructure Service, Alibaba Group, Hangzhou, 311121, China
Ling-Jie Xu
Iflytek Co., Ltd., Hefei, 230088, China
Cong Liu
Beijing Jingdong Century Trading Co., Ltd., Beijing, 100176, China
Hai-Feng Liu
RDA Microelectronics, Inc., Shanghai, 201203, China
Shan Tang
Advanced Micro Devices, Inc., Sunnyvale, CA, 94085, U.S.A.
Allen Rush & Willian Chen

Authors

Jin-Hua Tao
View author publications
You can also search for this author in PubMed Google Scholar
Zi-Dong Du
View author publications
You can also search for this author in PubMed Google Scholar
Qi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Ying Lan
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Yuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ling-Jie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Allen Rush
View author publications
You can also search for this author in PubMed Google Scholar
Willian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shao-Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Ji Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tian-Shi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Hua Tao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 235 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tao, JH., Du, ZD., Guo, Q. et al. BenchIP: Benchmarking Intelligence Processors. J. Comput. Sci. Technol. 33, 1–23 (2018). https://doi.org/10.1007/s11390-018-1805-8

Download citation

Received: 10 September 2017
Revised: 15 December 2017
Published: 26 January 2018
Issue Date: January 2018
DOI: https://doi.org/10.1007/s11390-018-1805-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BenchIP: Benchmarking Intelligence Processors

Abstract

Access this article

Similar content being viewed by others

End-to-End Benchmarking of Deep Learning Platforms

Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BenchIP: Benchmarking Intelligence Processors

Abstract

Access this article

Similar content being viewed by others

End-to-End Benchmarking of Deep Learning Platforms

Benchmarking Deep Neural Network Training Using Multi- and Many-Core Processors

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation