Abstract
Scene parsing is a very challenging problem which attracts increasing interests in many fields such as computer vision and robotics. However, occluded or small objects which are difficult to parse are always ignored. To deal with these two problems, we integrate visual phrase into our joint system, which has been proved to have good performance on describing relationships between objects. In this paper, we propose a joint model which integrates scene classification, object and visual phrase detection, as well as scene parsing together. By encoding them into a Conditional Random Field model, all tasks mentioned above could be solved jointly. We evaluate our method on the MSRC-21 dataset. The experimental results demonstrate that our method achieves comparable and on some occasions even superior performance with respect to state-of-the-art joint methods especially when there exist partially occluded or small objects.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: Label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1972–1979. IEEE (2009)
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Ren, X., Bo, L., Fox, D.: Rgb-(d) scene labeling: Features and algorithms. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766. IEEE (2012)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)
Tighe, J., Lazebnik, S.: Finding things: Image parsing with regions and per-exemplar detectors. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3001–3008. IEEE (2013)
Yao, J., Fidler, S., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 702–709. IEEE (2012)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492. IEEE (2010)
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)
Yang, J., Price, B., Cohen, S., Yang, M.H.: Context driven scene parsing with attention to rare classes. In: Proceedings of the CVPR (2014)
Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1745–1752. IEEE (2011)
Li, C., Parikh, D., Chen, T.: Automatic discovery of groups of objects for scene understanding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2735–2742. IEEE (2012)
Sadovnik, A., Chen, T.: Hierarchical object groups for scene classification. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 1881–1884. IEEE (2012)
Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3d geometric phrases. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40. IEEE (2013)
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)
Hazan, T., Urtasun, R.: A primal-dual message-passing algorithm for approximated large scale structured prediction. In: Advances in Neural Information Processing Systems, pp. 838–846 (2010)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 898–916 (2011)
Kohli, P., Kumar, M.P., Torr, P.H.S.: P3 and beyond: Solving energies with higher order cliques. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2007)
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Distributed Message passing for large scale graphical models. In: Proceedings of the CVPR (2011)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Tang, K., Zhao, Z., Chen, X. (2015). Joint Visual Phrase Detection to Boost Scene Parsing. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2015. Lecture Notes in Computer Science(), vol 9475. Springer, Cham. https://doi.org/10.1007/978-3-319-27863-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-27863-6_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27862-9
Online ISBN: 978-3-319-27863-6
eBook Packages: Computer ScienceComputer Science (R0)