Skip to main content

Hardware-Aware Softmax Approximation for Deep Neural Networks

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11364))

Included in the following conference series:

Abstract

There has been a rapid development of custom hardware for accelerating the inference speed of deep neural networks (DNNs), by explicitly incorporating hardware metrics (e.g., area and energy) as additional constraints, in addition to application accuracy. Recent efforts mainly focused on linear functions (matrix multiplication) in convolutional (Conv) or fully connected (FC) layers, while there is no publicly available study on optimizing the inference of non-linear functions in DNNs, with hardware constraints.

In this paper, we address the problem of cost-efficient inference for Softmax, a popular non-linear function in DNNs. We introduce a hardware-aware linear approximation framework by algorithm and hardware co-optimization, with the goal of minimizing the cost in terms of area and energy, without incurring significant loss in application accuracy. This is achieved by simultaneously reducing the operand bit-width and approximating cost-intensive operations in Softmax (e.g. exponential and division) with cost-effective operations (e.g. addition and bit shifts). We designed and synthesized a hardware unit for our approximation approach, to estimate the area and energy consumption. In addition, we introduce a training method to further save area and energy cost, by reduced precision. Our approach reduces area cost by 13\(\times \) and energy consumption by 2\(\times \) with 11-bit operand width, compared to baseline at 19-bit for VOC2007 dataset in Faster R-CNN.

X. Geng and J. Lin—Both authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The fractional bits of \(e^{x_i}\) and x are fixed at small numbers (less than 2) as they contribute less to the precision, as compared to the integer bits.

References

  1. Amin, H., Curtis, K.M., Hayes-Gill, B.R.: Piecewise linear approximation applied to nonlinear function of a neural network. In: IEE Proceedings-Circuits, Devices and Systems (1997)

    Google Scholar 

  2. Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR (2013)

    Google Scholar 

  3. Bengio, Y., Senecal, J.S.: Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Trans. Neural Netw. 19, 713–722 (2008)

    Article  Google Scholar 

  4. Chen, W., Grangier, D., Auli, M.: Strategies for training large vocabulary neural language models. In: ACL (2016)

    Google Scholar 

  5. Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2017)

    Article  Google Scholar 

  6. Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or –1. In: NIPS (2016)

    Google Scholar 

  7. Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)

    Google Scholar 

  8. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)

    Google Scholar 

  9. Dally, W.: High-performance hardware for machine learning. In: Cadence ENN Summit (2016)

    Google Scholar 

  10. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)

    Article  Google Scholar 

  11. Han, S., et al.: Eie: efficient inference engine on compressed deep neural network. In: ISCA (2016)

    Google Scholar 

  12. Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In: ICLR (2016)

    Google Scholar 

  13. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  15. He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: ICCV (2017)

    Google Scholar 

  16. Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  17. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<0.5\) MB model size. arXiv preprint arXiv:1602.07360 (2016)

  18. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: ISCA (2017)

    Google Scholar 

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPs (2012)

    Google Scholar 

  20. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  21. Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (2005)

    Google Scholar 

  22. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)

    Google Scholar 

  23. Namin, A.H., Leboeuf, K., Muscedere, R., Wu, H., Ahmadi, M.: Efficient hardware implementation of the hyperbolic tangent sigmoid function. In: ISCAS (2009)

    Google Scholar 

  24. Nilsson, P., Shaik, A.U.R., Gangarajaiah, R., Hertz, E.: Hardware implementation of the exponential function using taylor series. In: NORCHIP, 2014 (2014)

    Google Scholar 

  25. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  26. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  28. Willhalm, T., Dementiev, R., Fay, P.: Intel performance counter monitor (2017)

    Google Scholar 

  29. Yang, T.J., Chen, Y.H., Sze, V.: Designing energy-efficient convolutional neural networks using energy-aware pruning. In: CVPR (2017)

    Google Scholar 

  30. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR (2016)

    Google Scholar 

  31. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. In: ICLR (2017)

    Google Scholar 

Download references

Acknowledgement

This research is supported by the Agency for Science, Technology and Research (A*STAR) under its Hardware-Software Co-optimization for Deep Learning (Project No. A1892b0026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xue Geng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geng, X., Lin, J., Zhao, B., Kong, A., Aly, M.M.S., Chandrasekhar, V. (2019). Hardware-Aware Softmax Approximation for Deep Neural Networks. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11364. Springer, Cham. https://doi.org/10.1007/978-3-030-20870-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20870-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20869-1

  • Online ISBN: 978-3-030-20870-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics