FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks

Challapalle, Nagadastagiri; Rampalli, Sahithi; Jao, Nicholas; Ramanathan, Akshaykrishna; Sampson, John; Narayanan, Vijaykrishnan

doi:10.1007/s11265-020-01555-w

FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks

Published: 24 June 2020

Volume 92, pages 1247–1261, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Nagadastagiri Challapalle¹,
Sahithi Rampalli¹,
Nicholas Jao¹,
Akshaykrishna Ramanathan¹,
John Sampson¹ &
…
Vijaykrishnan Narayanan¹

690 Accesses
6 Citations
Explore all metrics

Abstract

Recently, Memory Augmented Neural Networks (MANN)s, a class of Deep Neural Networks (DNN)s have become prominent owing to their ability to capture the long term dependencies effectively for several Natural Language Processing (NLP) tasks. These networks augment conventional DNNs by incorporating memory and attention mechanisms external to the network to capture relevant information. Several MANN architectures have shown particular benefits in NLP tasks by augmenting an underlying Recurrent Neural Network (RNN) with external memory using attention mechanisms. Unlike conventional DNNs whose computational time is dominated by MAC operations, MANNs have more diverse behavior. In addition to MACs, the attention mechanisms of MANNs also consist of operations such as similarity measure, sorting, weighted memory access, and pair-wise arithmetic. Due to this greater diversity in operations, MANNs are not trivially accelerated by the same techniques used by existing DNN accelerators. In this work, we present an end-to-end hardware accelerator architecture, FARM, for the inference of RNNs and several variants of MANNs, such as the Differential Neural Computer (DNC), Neural Turing Machine (NTM) and Meta-learning model. FARM achieves an average speedup of 30x-190x and 80x-100x over CPU and GPU implementations, respectively. To address remaining memory bottlenecks in FARM, we then propose the FARM-PIM architecture, which augments FARM with in-memory compute support for MAC and content-similarity operations in order to reduce data traversal costs. FARM-PIM offers an additional speedup of 1.5x compared to FARM. Additionally, we consider an efficiency-oriented version of the PIM implementation, FARM-PIM-LP, that trades a 20% performance reduction relative to FARM for a 4x average power consumption reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Iqbal H. Sarker

Machine learning and deep learning

Article Open access 08 April 2021

Christian Janiesch, Patrick Zschech & Kai Heinrich

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Laith Alzubaidi, Jinglan Zhang, … Laith Farhan

References

Aga, S., Jeloka, S., Subramaniyan, A., Narayanasamy, S., Blaauw, D., & Das, R. (2017). Compute caches. In 2017 IEEE International symposium on high performance computer architecture (HPCA) (pp. 481–492).
Amin, H., Curtis, K. M., & Hayes-Gill, B. R. (1997). Piecewise linear approximation applied to nonlinear function of a neural network. IEE Proceedings - Circuits. Devices and Systems, 144(6), 313–317.
Article Google Scholar
Bordes, A., Usunier, N., Chopra, S., & Weston, J. (2015). Large-scale Simple Question Answering with Memory Networks. arXiv:1506.02075.
Chen, R., Siriyal, S., & Prasanna, V. (2015). Energy and memory efficient mapping of bitonic sorting on FPGA. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 240–249).
Chen, Y., Emer, J., & Sze, V. (2016). Eyeriss: a spatial architecture for Energy-Efficient dataflow for convolutional neural networks. In 2016 ACM/IEEE 43Rd annual international symposium on computer architecture (ISCA) (pp. 367–379).
Chi, P., Li, S., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y., & Xie, Y. (2016). PRIME: A Novel Processing-in-Memory Architecture For Neural Network Computation in reRAM-based Main Memory. In 2016 ACM/IEEE 43Rd annual international symposium on computer architecture (ISCA) (pp. 27–39).
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805.
Eckert, C., Wang, X., Wang, J., Subramaniyan, A., Iyer, R., Sylvester, D., Blaaauw, D., & Das, R. (2018). Neural cache: Bit-serial In-Cache acceleration of deep neural networks. In 2018 ACM/IEEE 45Th annual international symposium on computer architecture (ISCA) (pp. 383–396).
George, S., Li, X., Liao, M. J., Ma, K., Srinivasa, S., Mohan, K., Aziz, A., Sampson, J., Gupta, S. K., & Narayanan, V. (2018). Symmetric 2-D-Memory access to multidimensional data. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(6), 1040–1050.
Article Google Scholar
Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE International conference on acoustics, speech and signal processing (pp. 6645–6649).
Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv:1410.5401.
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., & Hassabis, D. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538, 471–476.
Article Google Scholar
Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017). Fpga-based accelerator for long short-term memory recurrent neural networks. In 2017 22Nd asia and south pacific design automation conference (ASP-DAC) (pp. 629–634).
Ha, H., Hwang, U., Hong, Y., & Yoon, S. (2018). Memory-Augmented Neural networks for knowledge tracing from the perspective of learning and forgetting. In Arxiv:1805:10768.
Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S., Wang, Y., Yang, H., & Dally, W. B. J. (2017). ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 75– 84).
Hu, M., Strachan, J. P., Li, Z., Grafals, E. M., Davila, N., Graves, C., Lam, S., Ge, N., Yang, J. J., & Williams, R. S. (2016). Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In 2016 53Nd ACM/EDAC/IEEE design automation conference (DAC) (pp. 1–6).
Intel Corporation: Intel Nervana Neural Network Processors. [online]. Available: https://www.intel.ai/nervana-nnp/. [Accessed: 26-Jun-2019].
Intel Corporation: Intel vtune amplifier performance profiler. [online]. Available: https://software.intel.com/en-us/intel-vtune-amplifier-xe. [Accessed: 26- Jun-2019].
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P.l., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T.V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C.R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., & Yoon, D.H. (2017). In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (pp. 1–12).
Kim, Y., Zhang, Y., & Li, P. (2012). A digital neuromorphic VLSI architecture with memristor crossbar synaptic array for machine learning. In 2012 IEEE International SOC conference (pp. 328–333).
Laguna, A. F., Niemier, M., & Hu, X. S. (2019). Design of hardware-friendly memory enhanced neural networks. In 2019 Design, automation test in europe conference exhibition (DATE) (pp. 1583–1586).
Lake, B. M., Salakhutdinov, R. R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350, 1332–1338.
Article MathSciNet Google Scholar
Leboeuf, K., Namin, A. H., Muscedere, R., Wu, H., & Ahmadi, M. (2008). High speed VLSI implementation of the hyperbolic tangent sigmoid function. In 2008 Third international conference on convergence and hybrid information technology (vol. 1, pp. 1070–1073).
Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., & Chen, Y. (2017). Dadiannao: A Neural Network Supercomputer. IEEE Transactions on Computers, 66(1), 73–88.
Article MathSciNet Google Scholar
Miller, A. H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., & Weston, J. (2016). Key-Value Memory networks for directly reading documents. In EMNLP.
Nvidia Corporation: Nvidia system management interface. [online]. Available: https://developer.nvidia.com/nvidia-system-management-interface. [Accessed: 26-Jun-2019].
Ranjan, A., Jain, S., Stevens, J. R., Das, D., Kaul, B., & Raghunathan, A. (2019). X-mann: a crossbar based architecture for memory augmented neural networks. In Proceedings of the 56th Annual Design Automation Conference 2019 (pp. 130:1–130:6).
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. P. (2016). Meta-Learning With Memory-Augmented neural networks. In ICML (pp. 1842–1850).
Shafiee, A., Nag, A., Muralimanohar, N., Balasubramonian, R., Strachan, J. P., Hu, M., Williams, R. S., & Srikumar, V. (2016). ISAAC: A convolutional neural network accelerator with In-Situ analog arithmetic in crossbars. In 2016 ACM/IEEE 43Rd annual international symposium on computer architecture (ISCA) (pp. 14–26).
Sukhbaatar, S., Szlam, A., Weston, J., & Fergus, R. (2015). End-To-End Memory Networks. Curran Associates, Inc. http://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf, (Vol. 28 pp. 2440–2448).
Synopsys: [online]. Available: https://www.synopsys.com/community/university-program/teaching-resources.html. [Accessed: 26- Jun- 2019].
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., & Polosukhin, I. (2017). Attention is All you Need. Advances in Neural Information Processing Systems, 30, 5998–6008. Curran Associates, Inc.. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, k., & Wierstra, D. (2016). Matching Networks for One Shot Learning. Advances in Neural Information Processing Systems, 29, 3630–3638. Curran Associates, Inc. http://papers.nips.cc/paper/6385-matching-networks-for-one-shot-learning.pdf.
Google Scholar
Weston, J., Bordes, A., Chopra, S., & Mikolov, T. (2016). Towards AI-complete Question Answering: A Set of Prerequisite Toy Tasks. CoRR arXiv:1502.05698.
Weston, J., Chopra, S., & Bordes, A. (2015). Memory networks. In 3Rd international conference on learning representations, ICLR.
Yin, W., Kann, K., Yu, M., & Schütze, H. (2017). Comparative study of CNN and RNN for natural language processing. arXiv:1702.01923.
Zhang, J., Shi, X., King, I., & Yeung, D. Y. (2017). Dynamic Key-Value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17 (pp. 765–774).

Download references

Acknowledgments

This work was supported in part by Semiconductor Research Corporation (SRC) Center for Brain-inspired Computing Enabling Autonomous Intelligence (C-BRIC) and Center for Research in Intelligent Storage and Processing in Memory (CRISP).

Author information

Authors and Affiliations

The Pennsylvania State University, University Park, State College, PA, 16801, USA
Nagadastagiri Challapalle, Sahithi Rampalli, Nicholas Jao, Akshaykrishna Ramanathan, John Sampson & Vijaykrishnan Narayanan

Authors

Nagadastagiri Challapalle
View author publications
You can also search for this author in PubMed Google Scholar
Sahithi Rampalli
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Jao
View author publications
You can also search for this author in PubMed Google Scholar
Akshaykrishna Ramanathan
View author publications
You can also search for this author in PubMed Google Scholar
John Sampson
View author publications
You can also search for this author in PubMed Google Scholar
Vijaykrishnan Narayanan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nagadastagiri Challapalle.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Challapalle, N., Rampalli, S., Jao, N. et al. FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks. J Sign Process Syst 92, 1247–1261 (2020). https://doi.org/10.1007/s11265-020-01555-w

Download citation

Received: 20 September 2019
Revised: 25 April 2020
Accepted: 20 May 2020
Published: 24 June 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11265-020-01555-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Machine learning and deep learning

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FARM: A Flexible Accelerator for Recurrent and Memory Augmented Neural Networks

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Machine learning and deep learning

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation