Bi-Directional LSTM with Quantum Attention Mechanism for Sentence Modeling

  • Xiaolei Niu
  • Yuexian HouEmail author
  • Panpan Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)


Bi-directional LSTM (BLSTM) often utilizes Attention Mechanism (AM) to improve the ability of modeling sentences. But additional parameters within AM may lead to difficulties of model selection and BLSTM training. To solve the problem, this paper redefines AM from a novel perspective of the quantum cognition and proposes a parameter-free Quantum AM (QAM). Furthermore, we make a quantum interpretation for BLSTM with Two-State Vector Formalism (TSVF) and find the similarity between sentence understanding and quantum Weak Measurement (WM) under TSVF. Weak value derived from WM is employed to represent the attention for words in a sentence. Experiments show that QAM based BLSTM outperforms common AM (CAM) [1] based BLSTM on most classification tasks discussed in this paper.


Attention mechanism Two-state vector formalism Weak measurement Quantum theory 



This work is funded in part by the Chinese 863 Program (grant No. 2015AA015403), the Key Project of Tianjin Natural Science Foundation (grant No. 15JCZDJC31100), the Tianjin Younger Natural Science Foundation (Grant no: 14JCQNJC00400), the Major Project of Chinese National Social Science Fund (grant No. 14ZDB153) and MSCA-ITN-ETN - European Training Networks Project (grant No. 721321, QUARTZ).


  1. 1.
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: HLT-NAACL, pp. 1480–1489 (2016)Google Scholar
  2. 2.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)Google Scholar
  3. 3.
    Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
  4. 4.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
  5. 5.
    Wang, Z., Busemeyer, J.R., Atmanspacher, H., Pothos, E.M.: The potential of using quantum theory to build models of cognition. Top. Cogn. Sci. 5(4), 672–688 (2013)Google Scholar
  6. 6.
    Bruza, P.D., Wang, Z., Busemeyer, J.R.: Quantum cognition: a new theoretical approach to psychology. Trends Cogn. Sci. 19(7), 383–393 (2015)CrossRefGoogle Scholar
  7. 7.
    Aharonov, Y., Vaidman, L.: Complete description of a quantum system at a given time. J. Phys. A: Math. Gen. 24(10), 2315 (1991)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Ravon, T., Vaidman, L.: The three-box paradox revisited. J. Phys. A: Math. Theor. 40(11), 2873 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Gibran, B.: Causal realism in the philosophy of mind. Essays Philos. 15(2), 5 (2014)CrossRefGoogle Scholar
  10. 10.
    Aharonov, Y., Vaidman, L.: The two-state vector formalism: an updated review. In: Muga, J., Mayato, R.S., Egusquiza, Í. (eds.) Time in Quantum Mechanics. Lecture Notes in Physics, vol. 734. pp. 399–447. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-73473-4_13
  11. 11.
    Aharonov, Y., Bergmann, P.G., Lebowitz, J.L.: Time symmetry in the quantum process of measurement. Phys. Rev. 134(6B), B1410 (1964)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Latta, R.L.: The Basic Humor Process: A Cognitive-shift Theory and the Case Against Incongruity, vol. 5. Walter de Gruyter (1999)Google Scholar
  13. 13.
    Tamir, B., Cohen, E.: Introduction to weak measurements and weak values. Quanta 2(1), 7–17 (2013)CrossRefzbMATHGoogle Scholar
  14. 14.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  15. 15.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)Google Scholar
  16. 16.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML 2014, pp. 1188–1196 (2014)Google Scholar
  17. 17.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)Google Scholar
  18. 18.
    Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  19. 19.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  20. 20.
    Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161. Association for Computational Linguistics (2011)Google Scholar
  21. 21.
    Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)Google Scholar
  22. 22.
    Dong, L., Wei, F., Liu, S., Zhou, M., Xu, K.: A statistical parsing framework for sentiment classification. Comput. Linguist. (2015)Google Scholar
  23. 23.
    Nakagawa, T., Inui, K., Kurohashi, S.: Dependency tree-based sentiment classification using CRFS with hidden variables. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 786–794. Association for Computational Linguistics (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyTianjin UniversityTianjinChina

Personalised recommendations