Evaluation of Multiple Approaches for Visual Question Reasoning

Jankoski, Kristijan; Gievska, Sonja

doi:10.1007/978-3-030-00825-3_18

Kristijan Jankoski¹⁰ &
Sonja Gievska¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 940))

Included in the following conference series:

International Conference on Telecommunications

745 Accesses

Abstract

Extracting meaningful patterns between objects i.e., relational reasoning is crucial element of human reasoning and still a challenging task for artificial intelligence. Our research objective was to investigate two end-to-end architectures augmented with a relational neural module on a challenging Cornell NLVR visual question answering task. It was our hope that the relational reasoning capabilities on multi-modal inputs for which the relational networks are famous for would be leveraged on the task at hand. We have achieved state-of-the-art performance outperforming the results reported in the related research studies conducted on the same benchmark dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: a system for large scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, pp. 265–283 (2016)
Google Scholar
Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: IEEE Conference on Computer Vision and Patter Recognition CVPR (2015)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico (2016)
Google Scholar
Chen, J., Kuznetsova, P., Warren, D.S., Choi, Y.: Deja image-captions: a corpus of expressive descriptions in repetition. In: 53rd Annual Meeting of the Association of Computational Linguistic, pp. 504–514. ACL, Denver (2015)
Google Scholar
Goldman, O., Latcinnik, V., Naveh, U., Globerson, A., Berant, J.: Weakly-supervised semantic parsing with abstract examples. In: 56th Annual Meeting of the Association of Computational Linguistic, pp. 1809–1819. ACL, Melbourne (2018)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. J. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Johnson, J., et al.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of IEEE Conference on Computer Vision and Patter Recognition, Honolulu, USA (2017)
Google Scholar
Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: International Conference on Computer Vision ICCV, Venice, Italy (2017)
Google Scholar
Krishna, R., et al.: Visual Genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., et al. (eds.) Advances in Neural Information Processing System, vol. 25. NIPS, Lake Tahoe (2012)
Google Scholar
Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: International Conference on Computer Vision ICCV, Santiago, Chile (2015)
Google Scholar
Sabour, S., Rosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing System, vol. 30. NIPS, Long Beach (2017)
Google Scholar
Santoro, A., et al.: A simple neural network module for relational reasoning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing System, vol. 30. NIPS, Long Beach (2017)
Google Scholar
Suhr, A., Lewis, M., Yeh, J., Artzi, Y.: A corpus of natural language for visual reasoning. In: 55th Annual Meeting of the Association of Computational Linguistic, pp. 217–223. ACL, Vancouver (2017)
Google Scholar
Tan, H., Bansal, M.: Object ordering with bidirectional matchings for visual reasoning, In: Proceedings of NAACL-HLT 2018. ACL, New Orleans (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

FCSE, Ss. Cyril and Methodius University, Skopje, Macedonia
Kristijan Jankoski & Sonja Gievska

Authors

Kristijan Jankoski
View author publications
You can also search for this author in PubMed Google Scholar
Sonja Gievska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonja Gievska .

Editor information

Editors and Affiliations

Faculty of Computer Science and Engineering, Saints Cyril and Methodius University of Skopje, Skopje, Macedonia
Slobodan Kalajdziski
Faculty of Computer Science and Engineering, Saints Cyril and Methodius University of Skopje, Skopje, Macedonia
Nevena Ackovska

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jankoski, K., Gievska, S. (2018). Evaluation of Multiple Approaches for Visual Question Reasoning. In: Kalajdziski, S., Ackovska, N. (eds) ICT Innovations 2018. Engineering and Life Sciences. ICT 2018. Communications in Computer and Information Science, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-00825-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-00825-3_18
Published: 13 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00824-6
Online ISBN: 978-3-030-00825-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics