Advertisement

Reproducing a Neural Question Answering Architecture Applied to the SQuAD Benchmark Dataset: Challenges and Lessons Learned

  • Alexander DürEmail author
  • Andreas Rauber
  • Peter Filzmoser
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)

Abstract

Reproducibility is one of the pillars of scientific research. This study attempts to reproduce the Gated Self-Matching Network, which is the basis of one of the best performing models on the SQuAD dataset. We reimplement the neural network model and highlight ambiguities in the original architectural description. We show that due to uncertainty about only two components of the neural network model and no precise description of the training process, it is not possible to reproduce the experimental results obtained by the original implementation. Finally we summarize what we learned from this reproduction process about writing precise neural network architecture descriptions, providing our implementation as a basis for future exploration.

References

  1. 1.
    Chen, D., Bolton, J., Manning, C.D.: A thorough examination of the CNN/daily mail reading comprehension task. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 2358–2367 (2016)Google Scholar
  2. 2.
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)Google Scholar
  3. 3.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  4. 4.
    Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)Google Scholar
  5. 5.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  6. 6.
    Hu, M., Peng, Y., Qiu, X.: Mnemonic reader for machine comprehension. arXiv preprint arXiv:1705.02798 (2017)
  7. 7.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  8. 8.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)Google Scholar
  9. 9.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 14, pp. 1532–1543 (2014)Google Scholar
  10. 10.
    Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2016)Google Scholar
  11. 11.
    Richardson, M., Burges, C.J., Renshaw, E.: Mctest: a challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 3, p. 4 (2013)Google Scholar
  12. 12.
    Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Heidelberg (2013),  https://doi.org/10.1007/978-3-642-61068-4
  13. 13.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700. Curran Associates, Inc. (2015)Google Scholar
  15. 15.
    Wang, S., Jiang, J.: Learning natural language inference with LSTM. In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)Google Scholar
  16. 16.
    Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (2017)Google Scholar
  17. 17.
    Xiong, C., Zhong, V., Socher, R.: Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604 (2016)
  18. 18.
    Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Alexander Dür
    • 1
    Email author
  • Andreas Rauber
    • 1
  • Peter Filzmoser
    • 1
  1. 1.Vienna University of TechnologyViennaAustria

Personalised recommendations