Semantic perception can provide autonomous robots operating under uncertainty with more efficient representation of their environment and better ability for correct loop closures than only geometric features. However, accurate inference of semantics requires measurement models that correctly capture properties of semantic detections such as viewpoint dependence, spatial correlations, and intra- and inter-class variations. Such models should also gracefully handle open-set conditions which may be encountered, keeping track of the resultant model uncertainty. We propose a method for robust visual classification of an object of interest observed from multiple views in the presence of significant localization uncertainty and classifier noise, and possible dataset shift. We use a viewpoint dependent measurement model to capture viewpoint dependence and spatial correlations in classifier scores, showing how to use it in the presence of localization uncertainty. Assuming a Bayesian classifier providing a measure of uncertainty, we show how its outputs can be fused in the context of the above model, allowing robust classification under model uncertainty when novel scenes are encountered. We present statistical evaluation of our method both in synthetic simulation, and in a 3D environment where rendered images are fed into a Deep Neural Network classifier. We compare to baseline methods in scenarios of varying difficulty showing improved robustness of our method to localization uncertainty and dataset shift. Finally, we validate our contribution w.r.t. localization uncertainty on a dataset of real-world images.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Ammirato, P., Poirson, P., Park, E., Kosecka, J., & Berg, A. C. (2017). A dataset for developing and benchmarking active vision. In IEEE International Conference on Robotics and Automation (ICRA)
Atanasov, N., Sankaran, B., Ny, J., Pappas, G. J., & Daniilidis, K. (2014). Nonmyopic view planning for active object classification and pose estimation. IEEE Transactions on Robotics, 30, 1078–1090.
Becerra, I., Valentín-Coronado, L. M., Murrieta-Cid, R., & Latombe, J. C. (2016). Reliable confirmation of an object identity by a mobile robot: A mixed appearance/localization-driven motion approach. International Journal of Robotics Research, 35(10), 1207–1233.
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175.
Bowman, S., Atanasov, N., Daniilidis, K., & Pappas, G. (2017). Probabilistic data association for semantic slam. In IEEE International Conference on Robotics and Automation (ICRA), IEEE (pp. 1722–1729).
Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Simultaneous localization and mapping: Present, future, and the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
Choudhary, S., Carlone, L., Nieto, C., Rogers, J., Christensen, H. I., & Dellaert, F. (2017). Distributed mapping with privacy and communication constraints: Lightweight algorithms and object-based models. International Journal of Robotics Research, 36(12), 1286–1311.
Farhi, E. I., & Indelman, V. (2019). ix-bsp: Belief space planning through incremental expectation. In IEEE International Conference on Robotics and Automation (ICRA)
Feldman, Y., & Indelman, V. (2018a). Bayesian viewpoint-dependent robust classification under model and localization uncertainty. In IEEE International Conference on Robotics and Automation (ICRA)
Feldman, Y., & Indelman, V. (2018b). Towards robust autonomous semantic perception. In Workshop on representing a complex world: perception, inference, and learning for joint semantic, geometric, and physical understanding, in conjunction with ieee international conference on robotics and automation (ICRA)
Gal, Y. (2017). Uncertainty in deep learning. Ph.D. thesis, University of Cambridge
Gal, Y., & Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML)
Gal, Y., Islam, R., & Ghahramani, Z. (2017). Deep bayesian active learning with image data. In International Conference on machine learning (ICML), JMLR. org (pp. 1183–1192).
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093
Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J., & Dellaert, F. (2012). iSAM2: Incremental smoothing and mapping using the Bayes tree. International Journal of Robotics Research, 31, 217–236.
Kendall, A., & Gal, Y. (2017). What uncertainties do we need in bayesian deep learning for computer vision? In Advances in neural information processing systems (NIPS) (pp. 5580–5590).
Kendall, A., Badrinarayanan, V., & Cipolla, R. (2015a). Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680.
Kendall, A., Grimes, M., & Cipolla, R. (2015b). Posenet: Convolutional networks for real-time 6-dof camera relocalization. In International Conference on Computer Vision (ICCV).
Kopitkov, D., & Indelman, V. (2018). Robot localization through information recovered from cnn classificators. In IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems (NIPS) (pp. 6402–6413).
Lianos, K. N., Schonberger, J. L., Pollefeys, M., & Sattler, T. (2018). Vso: Visual semantic odometry. In European Conference on Computer Vision (ECCV) (pp. 234–250).
Lütjens, B., Everett, M., & How, J. P. (2018). Safe reinforcement learning with model uncertainty estimates. arXiv:1810.08700
Malinin, A., & Gales, M. (2018). Predictive uncertainty estimation via prior networks. In Advances in neural information processing systems (NIPS) (pp. 7047–7058).
Malinin, A., Ragni, A., Knill, K., & Gales, M. (2017). Incorporating uncertainty into deep learning for spoken language assessment. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 2: Short Papers), vol. 2 (pp. 45–50).
McAllister, R., Gal, Y., Kendall, A., Van Der Wilk, M., Shah, A., Cipolla, R., & Weller, A. V. (2017). Concrete problems for autonomous vehicle safety: advantages of bayesian deep learning. In International Joint Conference on AI (IJCAI).
Miller, D., Dayoub, F., Milford, M., & Sünderhauf, N. (2018a). Evaluating merging strategies for sampling-based uncertainty techniques in object detection. arXiv:1809.06006
Miller, D., Nicholson, L., Dayoub, F., & Sünderhauf, N. (2018b). Dropout sampling for robust object detection in open-set conditions. In IEEE International conference on robotics and automation (ICRA), IEEE (pp. 1–7).
Mu, B., Liu, S. Y., Paull, L., Leonard, J., & How, J. (2016). Slam with objects using a nonparametric pose graph. In IEEE/RSJ International conference on intelligent robots and systems (IROS).
Myshkov, P., & Julier, S. (2016). Posterior distribution analysis for bayesian inference in neural networks. NIPS: In workshop on Bayesian deep learning.
Omidshafiei, S., Lopez, B. T., How, J. P., & Vian, J. (2016). Hierarchical bayesian noise inference for robust real-time probabilistic object classification. arXiv:1605.01042
Osband, I., Blundell, C., Pritzel, A., & Van Roy, B. (2016). Deep exploration via bootstrapped dqn. In Advances in neural information processing systems (NIPS) (pp. 4026–4034).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch. Advances in neural information processing systems (NIPS)
Patten, T., Zillich, M., Fitch, R., Vincze, M., & Sukkarieh, S. (2016). Viewpoint evaluation for online 3-d active object classification. IEEE Robotics and Automation Letters (RA-L), 1(1), 73–81.
Patten, T., Martens, W., & Fitch, R. (2018). Monte carlo planning for active object classification. Autonomous Robots, 42(2), 391–421.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct), 2825–2830.
Pillai, S., & Leonard, J. (2015). Monocular slam supported object recognition. In Robotics: Science and Systems (RSS).
Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., & Wang, Y. (2017). Unrealcv: Virtual worlds for computer vision. In Proceedings of the 2017 ACM on multimedia conference, ACM (pp. 1221–1224).
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009). Dataset shift in machine learning. Cambridge: The MIT press.
Rasmussen, C., & Williams, C. (2006). Gaussian processes for machine learning. Cambridge: The MIT press.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P., & Davison, A. J. (2013a). Slam++: Simultaneous localisation and mapping at the level of objects. In IEEE Conference on computer vision and pattern recognition (CVPR) (pp. 1352–1359).
Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H., & Davison, A. J. (2013b). Slam++: Simultaneous localisation and mapping at the level of objects. In IEEE Conference on computer vision and pattern recognition (CVPR) (pp. 1352–1359).
Singh, A., Sha, J., Narayan, K. S., Achim, T., & Abbeel, P. (2014). Bigbird: A large-scale 3d database of object instances. In 2014 IEEE international conference on robotics and automation (ICRA), IEEE (pp. 509–516).
Sünderhauf, N., Pham, T. T., Latif, Y., Milford, M., & Reid, I. (2017). Meaningful maps with object-oriented semantic mapping. In IEEE/RSJ International conference on intelligent robots and systems (IROS), IEEE (pp. 5079–5085).
Tchuiev, V., & Indelman, V. (2018). Inference over distribution of posterior class probabilities for reliable bayesian classification and object-level perception. IEEE Robotics and Automation Letters (RA-L), 3(4), 4329–4336.
Tchuiev, V., Feldman, Y., & Indelman, V. (2019). Data association aware semantic mapping and localization via a viewpoint-dependent classifier model. In IEEE/RSJ International conference on intelligent robots and systems (IROS).
Teacy, W., Julier, S. J., De Nardi, R., Rogers, A., & Jennings, N. R. (2015). Observation modelling for vision-based target search by unmanned aerial vehicles. In International conference on autonomous agents and multiagent systems (AAMAS) (pp. 1607–1614).
Velez, J., Hemann, G., Huang, A. S., Posner, I., & Roy, N. (2012). Modelling observation correlations for active exploration and robust object detection. Journal of Artificial Intelligence Research, 44, 423–425.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partially supported by the Israel Ministry of Science & Technology (MOST)
About this article
Cite this article
Feldman, Y., Indelman, V. Spatially-dependent Bayesian semantic perception under model and localization uncertainty. Auton Robot (2020). https://doi.org/10.1007/s10514-020-09921-0
- Semantic perception
- Object classification and pose estimation