LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks

Bartz, Christian; Yang, Haojin; Bethge, Joseph; Meinel, Christoph

doi:10.1007/978-3-030-21074-8_29

LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks

Christian Bartz¹⁶,
Haojin Yang¹⁶,
Joseph Bethge¹⁶ &
…
Christoph Meinel¹⁶

Conference paper
First Online: 19 June 2019

1599 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11367))

Abstract

Recently, deep neural networks have achieved remarkable performance on the task of object detection and recognition. The reason for this success is mainly grounded in the availability of large scale, fully annotated datasets, but the creation of such a dataset is a complicated and costly task. In this paper, we propose a novel method for weakly supervised object detection that simplifies the process of gathering data for training an object detector. We train an ensemble of two models that work together in a student-teacher fashion. Our student (localizer) is a model that learns to localize an object, the teacher (assessor) assesses the quality of the localization and provides feedback to the student. The student uses this feedback to learn how to localize objects and is thus entirely supervised by the teacher, as we are using no labels for training the localizer. In our experiments, we show that our model is very robust to noise and reaches competitive performance compared to a state-of-the-art fully supervised approach. We also show the simplicity of creating a new dataset, based on a few videos (e.g. downloaded from YouTube) and artificially generated data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc., New York (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates, Inc., New York (2015)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Deselaers, T., Alexe, B., Ferrari, V.: Localizing objects while learning their appearance. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 452–466. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_33
Chapter Google Scholar
Liang, X., Liu, S., Wei, Y., Liu, L., Lin, L., Yan, S.: Towards computational baby learning: a weakly-supervised approach for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 999–1007 (2015)
Google Scholar
Misra, I., Shrivastava, A., Hebert, M.: Watch and learn: semi-supervised learning for object detectors from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593–3602 (2015)
Google Scholar
Tang, Y., Wang, J., Gao, B., Dellandrea, E., Gaizauskas, R., Chen, L.: Large scale semi-supervised object detection using visual and semantic knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2119–2128. IEEE (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Singh, B., Najibi, M., Davis, L.S.: Sniper: efficient multi-scale training. arXiv:1805.09300 [cs] (2018)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop (2015)
Google Scholar
Tang, Z., Wang, D., Zhang, Z.: Recurrent neural network training with dark knowledge transfer. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5900–5904 (2016)
Google Scholar
Xie, J., Lu, Y., Gao, R., Zhu, S.C., Wu, Y.N.: Cooperative training of descriptor and generator networks. arXiv:1609.09408 [cs, stat] (2016)
Chen, T., Goodfellow, I., Shlens, J.: Net2Net: accelerating learning via knowledge transfer. In: International Conference on Learning Representations (2016)
Google Scholar
Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, L.J.: Learning from noisy labels with distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1910–1918 (2017)
Google Scholar
Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2309–2318 (2018)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates, Inc., New York (2014)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025. Curran Associates, Inc. (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Represenations, San Diego (2015)
Google Scholar
Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next-generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Niitani, Y., Ogawa, T., Saito, S., Saito, M.: ChainerCV: a library for deep learning in computer vision. In: Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, pp. 1217–1220. ACM, New York (2017)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge 2012 (VOC2012) results (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert Straße 2-3, 14482, Potsdam, Germany
Christian Bartz, Haojin Yang, Joseph Bethge & Christoph Meinel

Authors

Christian Bartz
View author publications
You can also search for this author in PubMed Google Scholar
Haojin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Bethge
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Bartz .

Editor information

Editors and Affiliations

School of Computer Science, University of Adelaide, Adelaide, Australia
Gustavo Carneiro
Data61, Commonwealth Scientific and Industrial Research Organization, Canberra, Australia
Shaodi You

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bartz, C., Yang, H., Bethge, J., Meinel, C. (2019). LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks. In: Carneiro, G., You, S. (eds) Computer Vision – ACCV 2018 Workshops. ACCV 2018. Lecture Notes in Computer Science(), vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-21074-8_29
Published: 19 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21073-1
Online ISBN: 978-3-030-21074-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics