A Fine-Grained Filtered Viewpoint Informed Keypoint Prediction from 2D Images

Li, Qingnan; Hu, Ruimin; Chen, Yixin; Yan, Jingwen; Xiao, Jing

doi:10.1007/978-3-319-77383-4_17

A Fine-Grained Filtered Viewpoint Informed Keypoint Prediction from 2D Images

Qingnan Li^19,20,
Ruimin Hu^19,20,21,
Yixin Chen^19,20,21,22,
Jingwen Yan^19,20 &
…
Jing Xiao^19,20,22

Conference paper
First Online: 10 May 2018

2336 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10736))

Abstract

Viewpoint informed keypoint prediction from 2D images is an essential task in computer vision, which captures the fine details of rigid objects, however, the cases of ambiguous viewpoint predicted by the convolutional neural network, especially for two peaks of high confidence viewpoint proposals, may specify a set of erroneous keypoints. To address the above issue, we present multiscale convolutional neural networks and propose a filter to ensure high confidence viewpoint informed, which provides a global perspective for keypoint prediction. Leveraging the global precedence, we combine multiscale local appearance based keypoint likelihood with filtered viewpoint conditioned likelihood to induce a considerable performance gain. Experimentally, we show that our framework outperforms state-of-the-art methods on PASCAL 3D benchmark.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Long, J., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Advances in Neural Information Processing Systems, vol. 2, pp. 1601–1609 (2014)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision, pp. 2686–2694 (2014)
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: Computer Vision & Pattern Recognition, vol. 32, no. 14, pp. 1385–1392 (2011)
Google Scholar
Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: Computer Vision & Pattern Recognition, pp. 3582–3589 (2014)
Google Scholar
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Computer Vision & Pattern Recognition, pp. 1510–1519 (2015)
Google Scholar
Zhang, N., Shelhamer, E., Gao, Y., Darrell, T.: Fine-grained pose prediction, normalization, and recognition. Comput. Sci. 69(2), 207–221 (2016)
Google Scholar
Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: IEEE International Conference on Computer Vision, vol. 9, no. 4, pp. 3342–3349 (2013)
Google Scholar
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: European Conference on Computer Vision, pp. 168–181 (2010)
Chapter Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results (2012)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Chabot, F., Chaouch, M., Rabarisoa, J., Teulire, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Computer Vision & Pattern Recognition (2017)
Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: Computer Vision & Pattern Recognition (2017)
Google Scholar
Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Subcategory-aware convolutional neural networks for object proposals and detection. In: IEEE Winter Conference on Applications of Computer Vision (2017)
Google Scholar
Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Computer Vision & Pattern Recognition, pp. 1653–1660 (2014)
Google Scholar
Tompson, J., Jain, A., Lecun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Eprint Arxiv, pp. 1799–1807 (2014)
Google Scholar

Download references

Acknowledgments

This work was partly supported by the National High Technology Research and Development Program of China (863 Program) No. 2015AA016306, National Nature Science Foundation of China (No. 61231015), EU FP7 QUICK project under Grant Agreement No. PIRSES-GA-2013-612652*, National Nature Science Foundation of China (61502348), Hubei Province Technological Innovation Major Project (No. 2016AAA015), science and technology program of Shenzhen (JCYJ20150422150029092).

Author information

Authors and Affiliations

School of Computer Science, National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan, 430072, China
Qingnan Li, Ruimin Hu, Yixin Chen, Jingwen Yan & Jing Xiao
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, 430072, China
Qingnan Li, Ruimin Hu, Yixin Chen, Jingwen Yan & Jing Xiao
Collaborative Innovation Center of Geospatial Technology, Wuhan, 430079, China
Ruimin Hu & Yixin Chen
Research Institute of Wuhan University, Shenzhen, China
Yixin Chen & Jing Xiao

Authors

Qingnan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jingwen Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q., Hu, R., Chen, Y., Yan, J., Xiao, J. (2018). A Fine-Grained Filtered Viewpoint Informed Keypoint Prediction from 2D Images. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-77383-4_17
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics