Hardware-Aware Softmax Approximation for Deep Neural Networks

  • Xue GengEmail author
  • Jie Lin
  • Bin Zhao
  • Anmin Kong
  • Mohamed M. Sabry Aly
  • Vijay Chandrasekhar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)


There has been a rapid development of custom hardware for accelerating the inference speed of deep neural networks (DNNs), by explicitly incorporating hardware metrics (e.g., area and energy) as additional constraints, in addition to application accuracy. Recent efforts mainly focused on linear functions (matrix multiplication) in convolutional (Conv) or fully connected (FC) layers, while there is no publicly available study on optimizing the inference of non-linear functions in DNNs, with hardware constraints.

In this paper, we address the problem of cost-efficient inference for Softmax, a popular non-linear function in DNNs. We introduce a hardware-aware linear approximation framework by algorithm and hardware co-optimization, with the goal of minimizing the cost in terms of area and energy, without incurring significant loss in application accuracy. This is achieved by simultaneously reducing the operand bit-width and approximating cost-intensive operations in Softmax (e.g. exponential and division) with cost-effective operations (e.g. addition and bit shifts). We designed and synthesized a hardware unit for our approximation approach, to estimate the area and energy consumption. In addition, we introduce a training method to further save area and energy cost, by reduced precision. Our approach reduces area cost by 13\(\times \) and energy consumption by 2\(\times \) with 11-bit operand width, compared to baseline at 19-bit for VOC2007 dataset in Faster R-CNN.


Softmax Nonlinear operation Power Area 



This research is supported by the Agency for Science, Technology and Research (A*STAR) under its Hardware-Software Co-optimization for Deep Learning (Project No. A1892b0026).


  1. 1.
    Amin, H., Curtis, K.M., Hayes-Gill, B.R.: Piecewise linear approximation applied to nonlinear function of a neural network. In: IEE Proceedings-Circuits, Devices and Systems (1997)CrossRefGoogle Scholar
  2. 2.
    Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR (2013)Google Scholar
  3. 3.
    Bengio, Y., Senecal, J.S.: Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Trans. Neural Netw. 19, 713–722 (2008)CrossRefGoogle Scholar
  4. 4.
    Chen, W., Grangier, D., Auli, M.: Strategies for training large vocabulary neural language models. In: ACL (2016)Google Scholar
  5. 5.
    Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 127–138 (2017)CrossRefGoogle Scholar
  6. 6.
    Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or –1. In: NIPS (2016)Google Scholar
  7. 7.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR (2016)Google Scholar
  8. 8.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)Google Scholar
  9. 9.
    Dally, W.: High-performance hardware for machine learning. In: Cadence ENN Summit (2016)Google Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)CrossRefGoogle Scholar
  11. 11.
    Han, S., et al.: Eie: efficient inference engine on compressed deep neural network. In: ISCA (2016)Google Scholar
  12. 12.
    Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In: ICLR (2016)Google Scholar
  13. 13.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  15. 15.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: ICCV (2017)Google Scholar
  16. 16.
    Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  17. 17.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<0.5\) MB model size. arXiv preprint arXiv:1602.07360 (2016)
  18. 18.
    Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: ISCA (2017)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPs (2012)Google Scholar
  20. 20.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  21. 21.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (2005)Google Scholar
  22. 22.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)Google Scholar
  23. 23.
    Namin, A.H., Leboeuf, K., Muscedere, R., Wu, H., Ahmadi, M.: Efficient hardware implementation of the hyperbolic tangent sigmoid function. In: ISCAS (2009)Google Scholar
  24. 24.
    Nilsson, P., Shaik, A.U.R., Gangarajaiah, R., Hertz, E.: Hardware implementation of the exponential function using taylor series. In: NORCHIP, 2014 (2014)Google Scholar
  25. 25.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). Scholar
  26. 26.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  28. 28.
    Willhalm, T., Dementiev, R., Fay, P.: Intel performance counter monitor (2017)Google Scholar
  29. 29.
    Yang, T.J., Chen, Y.H., Sze, V.: Designing energy-efficient convolutional neural networks using energy-aware pruning. In: CVPR (2017)Google Scholar
  30. 30.
    Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR (2016)Google Scholar
  31. 31.
    Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. In: ICLR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xue Geng
    • 1
    Email author
  • Jie Lin
    • 1
  • Bin Zhao
    • 2
  • Anmin Kong
    • 2
  • Mohamed M. Sabry Aly
    • 3
  • Vijay Chandrasekhar
    • 1
  1. 1.I²RA*STARSingaporeSingapore
  2. 2.IMEA*STARSingaporeSingapore
  3. 3.School of CSENTUSingaporeSingapore

Personalised recommendations