Skip to main content

Deep Multiple Instance Learning for Zero-Shot Image Tagging

  • Conference paper
  • First Online:
Book cover Computer Vision – ACCV 2018 (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11361))

Included in the following conference series:

Abstract

In-line with the success of deep learning on traditional recognition problem, several end-to-end deep models for zero-shot recognition have been proposed in the literature. These models are successful to predict a single unseen label given an input image, but does not scale to cases where multiple unseen objects are present. In this paper, we model this problem within the framework of Multiple Instance Learning (MIL). To the best of our knowledge, we propose the first end-to-end trainable deep MIL framework for the multi-label zero-shot tagging problem. Due to its novel design, the proposed framework has several interesting features: (1) Unlike previous deep MIL models, it does not use any off-line procedure (e.g., Selective Search or EdgeBoxes) for bag generation. (2) During test time, it can process any number of unseen labels given their semantic embedding vectors. (3) Using only seen labels per image as weak annotation, it can produce a bounding box for each predicted label. We experiment with large-scale NUS-WIDE dataset and achieve superior performance across conventional, zero-shot and generalized zero-shot tagging tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: CVPR, 07–12 June 2015, pp. 2927–2936 (2015)

    Google Scholar 

  2. Akata, Z., Malinowski, M., Fritz, M., Schiele, B.: Multi-cue zero-shot learning with strong supervision. In: CVPR, June 2016

    Google Scholar 

  3. Bourbaki, N.: Eléments de mathématiques: théorie des ensembles, chapitres 1 à 4, vol. 1. Masson (1990)

    Google Scholar 

  4. Chen, M., Zheng, A., Weinberger, K.Q.: Fast image tagging. In: ICML, January 2013

    Google Scholar 

  5. Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: Bing: binarized normed gradients for objectness estimation at 300fps. In: CVPR, pp. 3286–3293 (2014)

    Google Scholar 

  6. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: NUS-WIDE: a real-world web image database from National University of Singapore. In: CIVR, Santorini, Greece, 8–10 July 2009

    Google Scholar 

  7. Demirel, B., Gokberk Cinbis, R., Ikizler-Cinbis, N.: Attributes2classname: a discriminative model for attribute-based unsupervised zero-shot learning. In: ICCV, October 2017

    Google Scholar 

  8. Deutsch, S., Kolouri, S., Kim, K., Owechko, Y., Soatto, S.: Zero shot learning via multi-scale manifold regularization. In: CVPR, July 2017

    Google Scholar 

  9. Feng, J., Zhou, Z.H.: Deep MIML network. In: AAAI, pp. 1884–1890 (2017)

    Google Scholar 

  10. Fu, Y., Yang, Y., Hospedales, T., Xiang, T., Gong, S.: Transductive multi-label zero-shot learning. arXiv preprint arXiv:1503.07790 (2015)

  11. Girshick, R.: Fast R-CNN. In: ICCV, December 2015

    Google Scholar 

  12. Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013)

  13. Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995)

    MATH  Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, vol. 2016, pp. 770–778, January 2016. Cited by 107

    Google Scholar 

  15. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. arXiv preprint arXiv:1605.06409 (2016)

  16. Li, X., Liao, S., Lan, W., Du, X., Yang, G.: Zero-shot image tagging by hierarchical semantic embedding. In: RDIR, pp. 879–882. ACM (2015)

    Google Scholar 

  17. Li, Y., Wang, D., Hu, H., Lin, Y., Zhuang, Y.: Zero-shot recognition using dual visual-semantic mapping paths. In: CVPR, July 2017

    Google Scholar 

  18. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  19. Mensink, T., Gavves, E., Snoek, C.G.: COSTA: co-occurrence statistics for zero-shot classification. In: CVPR, pp. 2441–2448 (2014)

    Google Scholar 

  20. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  21. Morgado, P., Vasconcelos, N.: Semantically consistent regularization for zero-shot recognition. In: CVPR, July 2017

    Google Scholar 

  22. Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. In: ICLR (2014)

    Google Scholar 

  23. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  24. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, vol. 1, no. 2, p. 4 (2017)

    Google Scholar 

  25. Rahman, S., Khan, S., Porikli, F.: A unified approach for conventional zero-shot, generalized zero-shot, and few-shot learning. IEEE Trans. Image Process. 27(11), 5652–5667 (2018)

    Article  MathSciNet  Google Scholar 

  26. Rahman, S., Khan, S., Porikli, F.: Zero-shot object detection: learning to simultaneously recognize and localize novel concepts. In: Asian Conference on Computer Vision (ACCV). Springer, December 2018

    Google Scholar 

  27. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)

  28. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE TPAMI 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  29. Ren, Z., Jin, H., Lin, Z., Fang, C., Yuille, A.: Multiple instance visual-semantic embedding. In: BMVC (2017)

    Google Scholar 

  30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  31. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, 07–12 June 2015, pp. 1–9 (2015)

    Google Scholar 

  32. Tang, P., Wang, X., Feng, B., Liu, W.: Learning multi-instance deep discriminative patterns for image classification. IEEE TIP 26(7), 3385–3396 (2017)

    MathSciNet  MATH  Google Scholar 

  33. Tang, P., Wang, X., Huang, Z., Bai, X., Liu, W.: Deep patch learning for weakly supervised object classification and discovery. Pattern Recogn. 71, 446–459 (2017)

    Article  Google Scholar 

  34. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)

    Article  Google Scholar 

  35. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report, CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  36. Wang, X., Zhu, Z., Yao, C., Bai, X.: Relaxed multiple-instance SVM with application to object discovery, pp. 1224–1232 (2015)

    Google Scholar 

  37. Wei, Y., et al.: HCP: a flexible CNN framework for multi-label image classification. IEEE TPAMI 38(9), 1901–1907 (2016)

    Article  Google Scholar 

  38. Wu, J., Yu, Y., Huang, C., Yu, K.: Deep multiple instance learning for image classification and auto-annotation. In: CVPR, pp. 3460–3469, June 2015

    Google Scholar 

  39. Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: CVPR, June 2016

    Google Scholar 

  40. Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning - the good, the bad and the ugly. In: CVPR (2017)

    Google Scholar 

  41. Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: CVPR, July 2017

    Google Scholar 

  42. Zhang, Y., Gong, B., Shah, M.: Fast zero-shot image tagging. In: CVPR, June 2016

    Google Scholar 

  43. Zhou, Y., Sun, X., Liu, D., Zha, Z., Zeng, W.: Adaptive pooling in multi-instance learning for web video annotation. In: ICCV, October 2017

    Google Scholar 

  44. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shafin Rahman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rahman, S., Khan, S. (2019). Deep Multiple Instance Learning for Zero-Shot Image Tagging. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20887-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20886-8

  • Online ISBN: 978-3-030-20887-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics