Advertisement

Viewpoint Estimation—Insights and Model

  • Gilad Divon
  • Ayellet TalEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)

Abstract

This paper addresses the problem of viewpoint estimation of an object in a given image. It presents five key insights and a CNN that is based on them. The network’s major properties are as follows. (i) The architecture jointly solves detection, classification, and viewpoint estimation. (ii) New types of data are added and trained on. (iii) A novel loss function, which takes into account both the geometry of the problem and the new types of data, is propose. Our network allows a substantial boost in performance: from 36.1% gained by SOTA algorithms to 45.9%.

Notes

Acknowledgements

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU, as well as the Ollendorff Foundation.

Supplementary material

474202_1_En_16_MOESM1_ESM.pdf (5.2 mb)
Supplementary material 1 (pdf 5354 KB)

References

  1. 1.
    Huttenlocher, D.P.: Object recognition using alignment. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 102–111 (1987)Google Scholar
  2. 2.
    Lowe, D.G.: The viewpoint consistency constraint. Int. J. Comput. Vis. (IJCV) 1(1), 57–72 (1987)CrossRefGoogle Scholar
  3. 3.
    Lowe, D.G., et al.: Fitting parameterized three-dimensional models to images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 13(5), 441–450 (1991)CrossRefGoogle Scholar
  4. 4.
    Huttenlocher, D.P., Ullman, S.: Recognizing solid objects by alignment with an image. Int. J. Comput. Vis. (IJCV) 5(2), 195–212 (1990)CrossRefGoogle Scholar
  5. 5.
    Choi, C., Taguchi, Y., Tuzel, O., Liu, M.Y., Ramalingam, S.: Voting-based pose estimation for robotic assembly using a 3D sensor. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1724–1731 (2012)Google Scholar
  6. 6.
    Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22(12), 2633–2651 (2016)CrossRefGoogle Scholar
  7. 7.
    Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3d model views. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2686–2694 (2015)Google Scholar
  8. 8.
    Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. arXiv:1609.03894 (2016)
  9. 9.
    Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1510–1519 (2015)Google Scholar
  10. 10.
    Penedones, H., Collobert, R., Fleuret, F., Grangier, D.: Improving object classification using pose information. Technical report, Idiap (2012)Google Scholar
  11. 11.
    Osadchy, M., Cun, Y.L., Miller, M.L.: Synergistic face detection and pose estimation with energy-based models. J. Mach. Learn. Res., 1197–1215 (2007)Google Scholar
  12. 12.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)Google Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)Google Scholar
  14. 14.
    Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: Advances in Neural Information Processing Systems (NIPS), pp. 1990–1998 (2015)Google Scholar
  15. 15.
    Girshick, R.: Fast R-CNN. arXiv:1504.08083 (2015)
  16. 16.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 91–99 (2015)Google Scholar
  17. 17.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2650–2658 (2015)Google Scholar
  18. 18.
    Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R* CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1080–1088 (2015)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)Google Scholar
  20. 20.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  22. 22.
  23. 23.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Technical Report arXiv:1512.03012 (2015)
  24. 24.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492 (2010)Google Scholar
  25. 25.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)CrossRefGoogle Scholar
  26. 26.
    Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “ siamese” time delay neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 737–744 (1994)Google Scholar
  27. 27.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 539–546 (2005)Google Scholar
  28. 28.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1735–1742 (2006)Google Scholar
  29. 29.
    Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24261-3_7CrossRefGoogle Scholar
  30. 30.
    Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1386–1393 (2014)Google Scholar
  31. 31.
    Hoffer, E., Hubara, I., Soudry, D.: Fix your classifier: the marginal value of training the last weight layer. arXiv:1801.04540 (2018)
  32. 32.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE (2009)Google Scholar
  33. 33.
    Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras
  34. 34.
    Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org
  35. 35.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  36. 36.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
  37. 37.
    Maaten, L.V.D., Hinton, G., Visualizing data using t-SNE: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Technion – Israel Institute of TechnologyHaifaIsrael

Personalised recommendations