Advertisement

Efficient CORDIC-Based Sine and Cosine Implementation for a Dataflow Architecture

(Extended Abstract)
  • Daniel KhankinEmail author
  • Elad Raz
  • Ilan Tayari
Conference paper
  • 66 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12161)

Abstract

A program in a dataflow architecture is represented as a dataflow graph. The dataflow nodes in the graph represent operations to be executed on data. The edges represent a data value being transformed by a dataflow node. Such an architecture can allow exploitation of parallelism, code sharing, and out-of-order execution. The dataflow nodes include operations from a small set of operators: logical operations, switching, addition/subtraction, and multiplication. There is no arithmetic logic unit nor a floating-point unit. As a result, elementary operations for integer, and in particular floating-point, arithmetic are emulated in software. Therefore, when a more advanced functionality such as trigonometric functions is required, we find that the commonly used implementations are inefficient. The inefficiency results in an over-increased dataflow graph that directly translates to wasted area on the silicon, resulting in increased power consumption and lower throughput. Volder proposed the CORDIC algorithm for trigonometric functions, expressed in terms of basic rotations. In this work, we present a correctly-rounded and efficient implementation of the CORDIC algorithm for the dataflow architecture.

Keywords

Dataflow Floating-point CORDIC Trigonometric functions Elementary functions Efficient 

Notes

Acknowledgments

We thank Shachar Lovett for his valuable input; John Gustafson for his comments. We thank Laura Ferguson for her assistance.

References

  1. 1.
    IEEE Standard for Floating-Point Arithmetic. IEEE Std 754–2019 (Revision of IEEE 754–2008), pp. 1–84, July 2019.  https://doi.org/10.1109/IEEESTD.2019.8766229
  2. 2.
    Abraham, Z.: Fast evaluation of elementary mathematical functions with correctly rounded last bit. ACM Trans. Math. Softw. (TOMS) (1991). https://dl.acm.org/doi/abs/10.1145/114697.116813
  3. 3.
    Burgess, N., Milanovic, J., Stephens, N., Monachopoulos, K., Mansell, D.: Bfloat16 processing for neural networks. In: 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), pp. 88–91, June 2019.  https://doi.org/10.1109/ARITH.2019.00022. ISSN 1063-6889
  4. 4.
    Courbariaux, M., Bengio, Y., David, J.P.: Low precision arithmetic for deep learning. CoRR abs/1412.7024 (2014)Google Scholar
  5. 5.
    Daramy-Loirat, C., De Dinechin, F., Defour, D., Gallet, M., Gast, N., Lauter,C.: CRLIBM. A library of correctly rounded elementary functions indouble-precision (cit. on pp. xiii, xviii, xxvi, 17, 32, 37, 64, 89) (2010). http://lipforge.ens-lyon.fr/www/crlibm/
  6. 6.
    Dettmers, T.: 8-bit approximations for parallelism in deep learning. arXiv preprint arXiv:1511.04561 (2015)
  7. 7.
    Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. (TOMS) 33(2), 13 (2007)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. (CSUR) 23(1), 5–48 (1991)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning ICMLc 2015, Lille, France, vol. 37, pp. 1737–1746. JMLR.org (2015)Google Scholar
  10. 10.
    Hao, X., Yang, S., Wang, J., Deng, B., Wei, X., Yi, G.: Efficient implementation of cerebellar purkinje cell with the CORDIC algorithm on LaCSNN. Front. Neurosci. 13 (2019).  https://doi.org/10.3389/fnins.2019.01078, https://www.frontiersin.org/articles/10.3389/fnins.2019.01078/full
  11. 11.
    Hekstra, G., Deprettere, E.: Floating point Cordic. In: Proceedings of IEEE 11th Symposium on Computer Arithmetic, pp. 130–137, June 1993.  https://doi.org/10.1109/ARITH.1993.378100
  12. 12.
    Jaeger, A.: OpenLibm (2016)Google Scholar
  13. 13.
    Johnson, J.: Rethinking floating point for deep learning. arXiv preprint arXiv:1811.01721 (2018)
  14. 14.
    Kahan, W.: A logarithm too clever by half (2004)Google Scholar
  15. 15.
    Köster, U., et al.: Flexpoint: an adaptive numerical format for efficient training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 1742–1752 (2017)Google Scholar
  16. 16.
    Lakshmi, B., Dhar, A.: CORDIC architectures: a survey. VLSI Des. 2010, 2 (2010)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Maire, J.L., Brunie, N., Dinechin, F.D., Muller, J.M.: Computing floating-point logarithms with fixed-point operations. In: 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), pp. 156–163, July 2016.  https://doi.org/10.1109/ARITH.2016.24. ISSN 1063-6889
  18. 18.
    Meher, P.K., Valls, J., Juang, T.B., Sridharan, K., Maharatna, K.: 50 years of CORDIC: algorithms, architectures, and applications. IEEE Trans. Circuits Syst. I Regul. Pap. 56(9), 1893–1907 (2009).  https://doi.org/10.1109/TCSI.2009.2025803MathSciNetCrossRefGoogle Scholar
  19. 19.
    Muller, J.M.: On the definition of ulp(x). Research Report RR-5504, LIP RR-2005-09, INRIA, LIP, February 2005. https://hal.inria.fr/inria-00070503
  20. 20.
    Muller, J.M.: Elementary Functions: Algorithms and Implementation, 3 edn. Birkhäuser Basel (2016). https://www.springer.com/gp/book/9781489979810
  21. 21.
    Muller, J.M., et al.: Handbook of Floating-Point Arithmetic, 2 edn. Birkhäuser Basel (2018).  https://doi.org/10.1007/978-3-319-76526-6, https://www.springer.com/gp/book/9783319765259
  22. 22.
    Nguyen, H.T., Nguyen, X.T., Hoang, T.T., Le, D.H., Pham, C.K.: Low-resource low-latency hybrid adaptive CORDIC with floating-point precision. IEICE Electron. Exp. 12(9), 20150258–20150258 (2015)CrossRefGoogle Scholar
  23. 23.
    Payne, M.H., Hanek, R.N.: Radian reduction for trigonometric functions. ACM SIGNUM Newslett. 18(1), 19–24 (1983)CrossRefGoogle Scholar
  24. 24.
    Tulloch, A., Jia, Y.: High performance ultra-low-precision convolutions on mobile devices. arXiv preprint arXiv:1712.02427 (2017)
  25. 25.
    Volder, J.E.: The CORDIC trigonometric computing technique. IRE Trans. Electron. Comput. 3, 330–334 (1959)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.NextSiliconTel-AvivIsrael

Personalised recommendations