Learning Aerial Image Similarity Using Triplet Networks

  • Vytautas ValaitisEmail author
  • Virginijus Marcinkevicius
  • Rokas Jurevicius
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11974)


Unmanned aerial vehicles (UAV) faces localization challenges in satellite navigation systems denied environments. Images taken from on-board cameras can be used to compare against orthophotographical map to support visual localization algorithms. Image similarity estimation can be achieved calculating various similarity metrics. Pearson correlation was found to be the best choice for evaluating areal images similarity in our experiments. Still is not robust against image displacement caused by aircraft frame movement. We propose a new architecture of triplet neural network to learn image similarity measure. The proposed architecture incorporates VGG16 network base layers. Top layer structure, loss function and performance metrics being suggested by authors. Images were matched to the maps from satellite photo. The matching results from proposed neural network architecture were compared and evaluated against Pearson correlation.


Image similarity Triplet loss Neural networks UAV localization 


  1. 1.
    Andreeva, E., Ignatov, D.I., Grachev, A., Savchenko, A.V.: Extraction of visual features for recommendation of products via deep learning. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 201–210. Springer, Cham (2018). Scholar
  2. 2.
    Arsenault, M.O.: Lossless triplet loss. A more efficient loss function for Siamese NN (2018)Google Scholar
  3. 3.
    Banerjee, P., Bhunia, A.K., Bhattacharyya, A., Roy, P.P., Murala, S.: Local neighborhood intensity pattern: a new texture feature descriptor for image retrieval. CoRR (2017)Google Scholar
  4. 4.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
  5. 5.
    Boyd, S.: Introduction to Applied Linear Algebra Vectors, Matrices, and Least Squares. Cambridge University Press, Cambridge (2018)CrossRefGoogle Scholar
  6. 6.
    Chang, Y., Hsieh, C., Chang, K., Ringgaard, M., Lin, C.: Training and testing low-degree polynomial data mappings via linear SVM. J. Mach. Learn. Res. 11, 1471–1490 (2010)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE Computer Society (2005)Google Scholar
  9. 9.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)Google Scholar
  10. 10.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742. IEEE (2006)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR (2015)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  13. 13.
    Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008)Google Scholar
  14. 14.
    Jurevičius, R., Marcinkevičus, V.: Application of vision-based particle filter and visual odometry for UAV localization. In: WSCG 2017: Short Communications Proceedings: The 25th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision 2016 in co-operation with EUROGRAPHICS: University of West Bohemia, Plzen, Czech Republic, 29 May–2 June 2017, pp. 67–71. Václav Skala - UNION Agency (2017)Google Scholar
  15. 15.
    Jurevičius, R., Marcinkevičius, V., Šeibokas, J.: Robust GNSS-denied localization for UAV using particle filter and visual odometry. Mach. Vis. Appl. 30(7–8), 1181–1190 (2019). ISSN 0932-8092. eISSN 1432-1769 CrossRefGoogle Scholar
  16. 16.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc., New York (2012)Google Scholar
  18. 18.
    Lowe, D.G., et al.: Object recognition from local scale-invariant features. In: ICCV, vol. 99, no. 2, pp. 1150–1157 (1999)Google Scholar
  19. 19.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc., New York (2013)Google Scholar
  20. 20.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP (2014)Google Scholar
  21. 21.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. CoRR (2015)Google Scholar
  22. 22.
    Sergeyev, Y.D., Kvasov, D.E., Mukhametzhanov, M.S.: On the efficiency of nature-inspired metaheuristics in expensive global optimization with limited budget. Sci. Rep. 8, article 453 (2018)Google Scholar
  23. 23.
    Szegedy, C., et al.: Going deeper with convolutions. CoRR (2014)Google Scholar
  24. 24.
    Taylor, G.W., Spiro, I., Bregler, C., Fergus, R.: Learning invariance through imitation. In: CVPR 2011, pp. 2729–2736. IEEE (2011)Google Scholar
  25. 25.
    Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 4170–4178. Curran Associates, Inc., New York (2016)Google Scholar
  26. 26.
    Wang, J., et al.: Learning fine-grained image similarity with deep ranking. CoRR (2014)Google Scholar
  27. 27.
    Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. IEEE (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Vytautas Valaitis
    • 1
    Email author
  • Virginijus Marcinkevicius
    • 2
  • Rokas Jurevicius
    • 2
  1. 1.Vilnius University Institute of Computer ScienceVilniusLithuania
  2. 2.Vilnius University Institute of Data Science and Digital TechnologiesVilniusLithuania

Personalised recommendations