Advertisement

Attention and Memory Augmented Networks

  • Uday Kamath
  • John Liu
  • James Whitaker
Chapter

Abstract

In deep learning networks, as we have seen in the previous chapters, there are good architectures for handling spatial and temporal data using various forms of convolutional and recurrent networks, respectively. When the data has certain dependencies such as out-of-order access, long-term dependencies, unordered access, most standard architectures discussed are not suitable. Let us consider a specific example from the bAbI dataset where there are stories/facts presented, a question is asked, and the answer needs to be inferred from the stories. As shown in Fig. 9.1, it requires out of order access and long-term dependencies to find the right answer.

References

  1. [BCB14b]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: CoRR abs/1409.0473 (2014).Google Scholar
  2. [Bah+16b]
    Dzmitry Bahdanau et al. “End-to-end attention-based large vocabulary speech recognition”. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2016, Shanghai, China, March 20–25, 2016. 2016, pp. 4945–4949.Google Scholar
  3. [Cha+16a]
    William Chan et al. “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition”. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2016, Shanghai, China, March 20–25, 2016. 2016, pp. 4960–4964.Google Scholar
  4. [Cho+15b]
    Jan Chorowski et al. “Attention-Based Models for Speech Recognition”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 577–585.Google Scholar
  5. [Dan+17]
    Michal Daniluk et al. “Frustratingly Short Attention Spans in Neural Language Modeling”. In: CoRR abs/1702.04521 (2017).Google Scholar
  6. [DGS92]
    Sreerupa Das, C. Lee Giles, and Guo-Zheng Sun. “Using Prior Knowledge in a { NNPDA} to Learn Context-Free Languages”. In: Advances in Neural Information Processing Systems 5, [NIPS Conference, Denver, Colorado, USA, November 30 - December 3, 1992]. 1992, pp. 65–72.Google Scholar
  7. [Den+12]
    M. Denil et al. “Learning where to Attend with Deep Architectures for Image Tracking”. In: Neural Computation (2012).MathSciNetCrossRefGoogle Scholar
  8. [GWD14b]
    Alex Graves, Greg Wayne, and Ivo Danihelka. “Neural Turing Machines”. In: CoRR abs/1410.5401 (2014).Google Scholar
  9. [Gra+16]
    Alex Graves et al. “Hybrid computing using a neural network with dynamic external memory”. In: Nature 538.7626 (Oct. 2016), pp. 471–476.Google Scholar
  10. [Gre+15]
    Edward Grefenstette et al. “Learning to Transduce with Unbounded Memory”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 1828–1836.Google Scholar
  11. [Hen+16]
    Mikael Henaff et al. “Tracking the World State with Recurrent Entity Networks”. In: CoRR abs/1612.03969 (2016).Google Scholar
  12. [Kum+16]
    Ankit Kumar et al. “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing”. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19–24, 2016. 2016, pp. 1378–1387.Google Scholar
  13. [LH10]
    Hugo Larochelle and Geoffrey E Hinton. “Learning to combine foveal glimpses with a third-order Boltzmann machine”. In: Advances in Neural Information Processing Systems 23. Ed. by J. D. Lafferty et al. Curran Associates, Inc., 2010, pp. 1243–1251.Google Scholar
  14. [Lin+17]
    Zhouhan Lin et al. “A Structured Self-attentive Sentence Embedding”. In: CoRR abs/1703.03130 (2017).Google Scholar
  15. [LPM15]
    Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. “Effective Approaches to Attention-based Neural Machine Translation”. In: CoRR abs/1508.04025 (2015).Google Scholar
  16. [Moz94]
    Michael C. Mozer. “Neural Net Architectures for Temporal Sequence Processing”. In: Addison-Wesley, 1994, pp. 243–264.Google Scholar
  17. [RCW15]
    Alexander M. Rush, Sumit Chopra, and Jason Weston. “A Neural Attention Model for Abstractive Sentence Summarization”. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. 2015, pp. 379–389.Google Scholar
  18. [SP63]
    Karl Steinbuch and Uwe A. W. Piske. “Learning Matrices and Their Applications”. In: IEEE Trans. Electronic Computers 12.6 (1963), pp. 846–862.CrossRefGoogle Scholar
  19. [Suk+15]
    Sainbayar Sukhbaatar et al. “End-To-End Memory Networks”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2440–2448.Google Scholar
  20. [Vas+17c]
    Ashish Vaswani et al. “Attention is All you Need”. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA. 2017, pp. 6000–6010.Google Scholar
  21. [Vin+15a]
    Oriol Vinyals et al. “Grammar as a Foreign Language”. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada. 2015, pp. 2773–2781.Google Scholar
  22. [Wan+16b]
    Yequan Wang et al. “Attention-based LSTM for Aspect-level Sentiment Classification”. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016. 2016, pp. 606–615.Google Scholar
  23. [WCB14]
    Jason Weston, Sumit Chopra, and Antoine Bordes. “Memory Networks”. In: CoRR abs/1410.3916 (2014).Google Scholar
  24. [Yan+16]
    Zichao Yang et al. “Hierarchical Attention Networks for Document Classification”. In: NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12–17, 2016. 2016, pp. 1480–1489.Google Scholar
  25. [Zha+18]
    Yuanyuan Zhang et al. “Attention Based Fully Convolutional Network for Speech Emotion Recognition”. In: CoRR abs/1806.01506 (2018).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Uday Kamath
    • 1
  • John Liu
    • 2
  • James Whitaker
    • 1
  1. 1.Digital Reasoning Systems Inc.McLeanUSA
  2. 2.Intelluron CorporationNashvilleUSA

Personalised recommendations