Skip to main content

Evaluation of Multiple Approaches for Visual Question Reasoning

  • Conference paper
  • First Online:
Book cover ICT Innovations 2018. Engineering and Life Sciences (ICT 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 940))

Included in the following conference series:

  • 745 Accesses

Abstract

Extracting meaningful patterns between objects i.e., relational reasoning is crucial element of human reasoning and still a challenging task for artificial intelligence. Our research objective was to investigate two end-to-end architectures augmented with a relational neural module on a challenging Cornell NLVR visual question answering task. It was our hope that the relational reasoning capabilities on multi-modal inputs for which the relational networks are famous for would be leveraged on the task at hand. We have achieved state-of-the-art performance outperforming the results reported in the related research studies conducted on the same benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: a system for large scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, pp. 265–283 (2016)

    Google Scholar 

  2. Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: IEEE Conference on Computer Vision and Patter Recognition CVPR (2015)

    Google Scholar 

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico (2016)

    Google Scholar 

  4. Chen, J., Kuznetsova, P., Warren, D.S., Choi, Y.: Deja image-captions: a corpus of expressive descriptions in repetition. In: 53rd Annual Meeting of the Association of Computational Linguistic, pp. 504–514. ACL, Denver (2015)

    Google Scholar 

  5. Goldman, O., Latcinnik, V., Naveh, U., Globerson, A., Berant, J.: Weakly-supervised semantic parsing with abstract examples. In: 56th Annual Meeting of the Association of Computational Linguistic, pp. 1809–1819. ACL, Melbourne (2018)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. J. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Johnson, J., et al.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of IEEE Conference on Computer Vision and Patter Recognition, Honolulu, USA (2017)

    Google Scholar 

  8. Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: International Conference on Computer Vision ICCV, Venice, Italy (2017)

    Google Scholar 

  9. Krishna, R., et al.: Visual Genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., et al. (eds.) Advances in Neural Information Processing System, vol. 25. NIPS, Lake Tahoe (2012)

    Google Scholar 

  11. Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: International Conference on Computer Vision ICCV, Santiago, Chile (2015)

    Google Scholar 

  12. Sabour, S., Rosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing System, vol. 30. NIPS, Long Beach (2017)

    Google Scholar 

  13. Santoro, A., et al.: A simple neural network module for relational reasoning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing System, vol. 30. NIPS, Long Beach (2017)

    Google Scholar 

  14. Suhr, A., Lewis, M., Yeh, J., Artzi, Y.: A corpus of natural language for visual reasoning. In: 55th Annual Meeting of the Association of Computational Linguistic, pp. 217–223. ACL, Vancouver (2017)

    Google Scholar 

  15. Tan, H., Bansal, M.: Object ordering with bidirectional matchings for visual reasoning, In: Proceedings of NAACL-HLT 2018. ACL, New Orleans (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonja Gievska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jankoski, K., Gievska, S. (2018). Evaluation of Multiple Approaches for Visual Question Reasoning. In: Kalajdziski, S., Ackovska, N. (eds) ICT Innovations 2018. Engineering and Life Sciences. ICT 2018. Communications in Computer and Information Science, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-00825-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00825-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00824-6

  • Online ISBN: 978-3-030-00825-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics