Semantic Situation Extraction from Satellite Image Based on Neural Networks

  • Xutao Qu
  • Dongye Zhuang
  • Haibin XieEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11741)


Satellite Image Situation Awareness (SISA) is a task that generates semantic situations from satellite images automatically. It requires not only the position and basic attributions (color, size, etc.) of targets but also the relationships (counting, relative position, existence, comparison, etc.) among them and realization of the situation analysis rules. We propose a novel framework which consists of the Background Process, Visual Question Answering (VQA), and Association Rules Set (ARS), in which, the Background Process deals with the situational map, the VQA and ARS identifies the relationships through answering a set of questions on SISA. To verify the performance of our method, we build the evaluation dataset based on CLEVR. Experiments demonstrate that our approach outperforms the traditional SISA systems on accuracy and automaticity. To the best of our knowledge, we are the first to solve SA problem using VQA method. The meaning of our research are: (1) We provide the possibility that SISA can be accomplished through VQA (without precise scene graph). (2) We broaden the application of VQA.


Situation awareness Visual question answering Neural network 


  1. 1.
    Antol, S., et al.: VQA: Visual Question Answering. In: International Conference on Computer Vision, pp. 2425–2433. IEEE Press (2015).
  2. 2.
    Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017). Scholar
  3. 3.
    Zhu, Y., Groth, O., Bernstein, M.S., Feifei, L.: Visual7W: grounded question answering in images. In: Computer Vision and Pattern Recognition, pp. 4995–5004 (2016).
  4. 4.
    Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: MovieQA: understanding stories in movies through question-answering. In: Computer Vision and Pattern Recognition, pp. 4631–4640 (2016).
  5. 5.
    Johnson, J., Hariharan, B., van der Maaten, L., Feifei, L., Zitnick, C.L., Girshick, R.B.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: Computer Vision and Pattern Recognition, pp. 1988–1997 (2017)Google Scholar
  6. 6.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Computer Vision and Pattern Recognition, pp. 39–48 (2016)Google Scholar
  7. 7.
    Johnson, J., et al.: Inferring and executing programs for visual reasoning, pp. 3008–3017 (2017).
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016).
  9. 9.
    Wickens, C.D.: Situation awareness review of Mica Endsley’s 1995 articles on situation awareness theory and measurement. Hum. Factors J. Hum. Factors Ergon. Soc. 50(3), 397–403 (2008). Scholar
  10. 10.
    Kokara, M.M., Matheusb, C.J., Baclawskic, K.: Ontology-based situation awareness. Inf. Fusion 10(1), 83–98 (2009)CrossRefGoogle Scholar
  11. 11.
    Li, Z., Itti, L.: Saliency and gist features for target detection in satellite images. IEEE Trans. Image Process. 20(7), 2017–2029 (2011). Scholar
  12. 12.
    Wu, H., Zhang, H., Zhang, J., Xu, F.: Typical target detection in satellite images based on convolutional neural networks, pp. 2956–2961 (2015).
  13. 13.
    Corbane, C., Najman, L., Pecoul, E., Demagistri, L., Petit, M.: A complete processing chain for ship detection using optical satellite imagery. Int. J. Remote Sensing 31(22), 5837–5854 (2010). Scholar
  14. 14.
    Agrawal, A., Batra, D., Parikh, D.: Analyzing the behavior of visual question answering models. In: Empirical Methods in Natural Language Processing, pp. 1955–1960 (2016)Google Scholar
  15. 15.
    Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Learning to compose neural networks for question answering. In: North American Chapter of the Association for Computational Linguistics, pp. 1545–1554 (2016)Google Scholar
  16. 16.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    He, K., et al.: Deep residual learning for image recognition. IEEE Press (2016).
  18. 18.
    DigitalGlobe Company.
  19. 19.
    Xu, N., et al.: Scene graph captioner: image captioning based on structural visual representation. J. Vis. Commun. Image Represent. 58, 477–485 (2019)CrossRefGoogle Scholar
  20. 20.
    Endsley, M.R.: Situation awareness: operationally necessary and scientifically grounded. Cogn. Technol. Work 17(2), 163–167 (2015)CrossRefGoogle Scholar
  21. 21.
    Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: International Conference on Computer Vision, pp. 1270–1279 (2017).
  22. 22.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National University of Defense TechnologyChangshaChina

Personalised recommendations