Skip to main content
Log in

BenchIP: Benchmarking Intelligence Processors

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BenchIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BenchIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect various characteristics of the evaluated intelligence processors. BenchIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BenchIP will be open-sourced soon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84-90.

    Article  Google Scholar 

  2. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. http://arxiv.org/abs/1409.1556, Dec. 2017.

  3. He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. arXiv: 1512.03385, 2015. http://arxiv.org/abs/1512.03385, Dec. 2017.

  4. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K. Sequence to sequence-video to text. In Proc. the Int. Conf. Computer Vision, Dec. 2015, pp.4534-4542.

  5. Abdel-Hamid O, Mohamed A R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. In Proc. IEEE/ACM Trans. Audio Speech and Language Processing, July 2014, pp.1533-1545.

  6. Eriguchi A, Hashimoto K, Tsuruok Y. Tree-to-sequence attentional neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, Aug. 2016, pp.823-833.

  7. Farabet C, Poulet C, Han J Y, LeCun Y. CNP: An FPGA-based processor for convolutional networks. In Proc. Int. Conf. Field Programmable Logic and Applications, Aug. 31-Sept. 2, 2009, pp.32-37.

  8. Zhang C, Li P, Sun G Y, Guan Y J, Xiao B J, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. the ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Feb. 2015, pp.161-170.

  9. Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2014, pp.269-284.

  10. Farabet C, Martini B, Corda B, Akselrod P, Culurciello E, LeCun Y. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, June 2011, pp.109-116.

  11. Han S, Liu X Y, Mao H Z, Pu J, Pedram A, Horowitz M A, Dally W J. EIE: Efficient inference engine on compressed deep neural network. In Proc. the 43rd Int. Symp. Computer Architecture, June 2016, pp.243-254.

  12. Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Oct. 2008, pp.72-81.

  13. Alwani M, Chen H, Ferdman M, Milder P. Fused-layer CNN accelerators. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.

  14. Judd P, Albericio J, Hetherington T, Aamodt T M, Moshovos A. Stripes: Bit-serial deep neural network computing. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.

  15. Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S W. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016.

  16. Zhang S J, Du Z D, Zhang L, Lan H Y, Liu S L, Li L, Guo Q, Chen T, Chen Y J. Cambricon-x: An accelerator for sparse neural networks. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016.

  17. Ji Y, Zhang Y H, Li S C, Chi P, Jiang C H, Qu P, Xie Y, Chen W G. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, Oct. 2016.

  18. Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.380-392.

  19. LiKamWa R, Hou Y H, Gao Y, Polansky M, Zhong L. Red-Eye: Analog convNet image sensor architecture for continuous mobile vision. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.255-266.

  20. Albericio J, Judd P, Hetherington T, Aamodt T, Jerger N E, Moshovos A. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016.

  21. Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.27-39.

  22. Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan J P, Hu M, Williams R S, Srikumar V. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.14-26.

  23. Liu S L, Du Z D, Tao J H, Han D, Luo T, Xie Y, Chen Y J, Chen T S. Cambricon: An instruction set architecture for neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.393-405.

  24. Chen Y H, Emer J, Sze V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.367-379.

  25. Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee S K, Hernández-Lobato J M, Wei G Y, Brooks D. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.267-278.

  26. Song L H, Qian X H, Li H, Chen Y R. PipeLayer: A pipelined reRAM-based accelerator for deep learning. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.541-552.

  27. Lu W Y, Yan G H, Li J J, Gong S J, Han Y H, Li X W. FlexFlow: A flexible dataflow accelerator architecture for convolutional neural networks. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017, pp.553-564.

  28. Song M C, Hu Y, Chen H X, Li T. Towards pervasive and user satisfactory CNN across GPU microarchitectures. In Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2017.

  29. Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv: 1603.05279, 2016. http://arxiv.org/abs/1603.05279, Dec. 2017.

  30. Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. arXiv: 1506.02626, 2015. http://arxiv.org/abs/1506.02626, Dec. 2017.

  31. Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 28th Int. Conf. Neural Information Processing Systems, Dec. 2015, pp.91-99.

  32. Parkhi O M, Vedaldi A, Zisserman A. Deep face recognition. In Proc. the British Machine Vision Conf., September 2015, pp.41:1-41:12.

  33. Johnson J, Karpathy A, Li F F. DenseCap: Fully convolutional localization networks for dense captioning. arXiv: 1511.07571, 2015. http://arxiv.org/abs/1511.07571, Dec. 2017.

  34. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1520-1528.

  35. Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.6645-6649.

  36. Andor D, Alberti C, Weiss D, Severyn A, Presta A, Ganchev K, Petrov S, Collins M. Globally normalized transition-based neural networks. arXiv: 1603.06042, 2016. http://arxiv.org/abs/1603.06042, Dec. 2017.

  37. Chen Y J, Luo T, Liu S L, Zhang S J, He L Q, Wang J, Li L, Chen T S, Xu Z W, Sun N H, Temam O. DaDianNao: A machine-learning supercomputer. In Proc. the 47th Annual IEEE/ACM Int. Symp. Microarchitecture, Dec. 2014, pp.609-622.

  38. Du Z D, Fasthuber R, Chen T S, Ienne P, Li L, Feng X B, Chen Y J, Temam O. ShiDianNao: Shifting vision processing closer to the sensor. In Proc. the 42nd Annual Int. Symp. Computer Architecture, June 2015, pp.92-104.

  39. Chen T S, Chen Y J, Duranton M, Guo Q, Hashmi A, Lipasti M, Nere A, Qiu S, Sebag M, Temam O. BenchNN: On the broad potential application scope of hardware neural network accelerators. In Proc. IEEE Int. Symp. Workload Characterization, Nov. 2012, pp.36-45.

  40. Adolf R, Rama S, Reagen B, Wei G Y, Brooks D. Fathom: Reference workloads for modern deep learning methods. In Proc. IEEE Int. Symp. Workload Characterization, Sept. 2016.

  41. Murtagh F, Hernández-Pajares M. The Kohonen selforganizing map method: An assessment. Journal of Classification, 1995, 12(2): 165-190.

    Article  MATH  Google Scholar 

  42. Jia Y Q, Shelhamer E, Donahue J et al. Caffe: Convolutional architecture for fast feature embedding. arXiv: 1408.5093, 2014. http://arxiv.org/abs/1408.5093, Dec. 2017.

  43. Chen T Q, Li M, Li Y T et al. MXNET: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv: 1512.01274, 2015. http://arxiv.org/abs/1512.01274, Dec. 2017.

  44. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

    Article  Google Scholar 

  45. Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3128-3137.

  46. He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proc. IEEE Int. Conf. Computer Vision, Dec. 2015, pp.1026-1034.

  47. Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.1701-1708.

  48. Le Q V. Building high-level features using large scale unsupervised learning. In Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2013, pp.8595-8598.

  49. Jouppi N P, Young C, Patil N et al. In-datacenter performance analysis of a tensor processing unit. arXiv: 1704.04760, 2017. http://arxiv.org/abs/1704.04760, Dec. 2017.

  50. Phansalkar A, Joshi A, John L K. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In Proc. the 34th Annual Int. Symp. Computer Architecture, June 2007, pp.412-423.

  51. McCalpin J D. Memory bandwidth and machine balance in current high performance computers. In Proc. the IEEE Computer Society Technical Committee on Computer Architecture, Dec. 1995, pp.19-25.

  52. Bull J M, O’Neill D. A microbenchmark suite for OpenMP 2.0. ACM SIGARCH Computer Architecture News, 2001, 29(5): 41-48.

    Article  Google Scholar 

  53. Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks. In Proc. the 31st Int. Conf. Machine Learning, June 2014, pp.1764-1772.

  54. Marcus M P, Santorini B, Marcinkiewicz M A. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics, 1993, 19(2): 313-330.

    Google Scholar 

  55. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.

    Article  MathSciNet  Google Scholar 

  56. Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338.

    Article  Google Scholar 

  57. Huang G B, Ramesh M, Berg T, Learned-Miller E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, Amherst: University of Massachusetts, 2007. http://viswww.cs.umass.edu/lfw/, Dec. 2017.

  58. Chen D L, Dolan W B. Collecting highly parallel data for paraphrase evaluation. In Proc. the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, pp.190-200.

  59. Mucci P J, Browne S, Deane C, Ho G. PAPI: A portable interface to hardware performance counters. In Proc. Department of Defense HPCMP Users Group Conf., June 1999, pp.7-10.

  60. Ding C, Zhong Y T. Predicting whole-program locality through reuse distance analysis. In Proc. the ACM SIGPLAN Conf. Programming Language Design and Implementation, June 2003, pp.245-257.

  61. Pawlowski J T. Hybrid memory cube: Breakthrough dram performance with a fundamentally re-architected dram subsystem. In Proc. the 23rd Hot Chips Symp., August 2011.

  62. Courbariaux M, Bengio Y. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv: 1602.02830, 2016. http://arxiv.org/abs/1602.02830v1, Dec. 2017.

  63. Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv: 1603.05279, 2016. http://arxiv.org/abs/1603.05279, Dec. 2017.

  64. Denkowski M, Lavie A. Meteor universal: Language specific translation evaluation for any target language. In Proc. the 9th Workshop on Statistical Machine Translation, June 2014, pp.376-380.

  65. Keckler SW, DallyW J, Khailany B, Garland M, Glasco D. GPUs and the future of parallel computing. IEEE Micro, 2011, 31(5): 7-17.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin-Hua Tao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 235 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tao, JH., Du, ZD., Guo, Q. et al. BenchIP: Benchmarking Intelligence Processors. J. Comput. Sci. Technol. 33, 1–23 (2018). https://doi.org/10.1007/s11390-018-1805-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-018-1805-8

Keywords

Navigation