Advertisement

Distinctive-Attribute Extraction for Image Captioning

  • Boeun Kim
  • Young Han Lee
  • Hyedong Jung
  • Choongsang ChoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

Image captioning has evolved with the progress of deep neural networks. However, generating qualitatively detailed and distinctive captions is still an open issue. In previous works, a caption involving semantic description can be generated by applying additional information into the RNNs. In this approach, we propose a distinctive-attribute extraction (DaE) method that extracts attributes which explicitly encourage RNNs to generate an accurate caption. We evaluate the proposed method with a challenge data and verify that this method improves the performance, describing images in more detail. The method can be plugged into various models to improve their performance.

Keywords

Image captioning Semantic information Distinctive-attribute Term frequency-inverse document frequency (TF-IDF) 

Notes

Acknowledgement

This work was supported by IITP/MSIT [2017-0-00255, Autonomous digital companion framework and application].

Supplementary material

References

  1. 1.
    Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)Google Scholar
  2. 2.
    Chen, X., et al.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)
  3. 3.
    Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras
  4. 4.
    Dai, B., Lin, D.: Contrastive learning for image captioning. In: Advances in Neural Information Processing Systems, pp. 898–907 (2017)Google Scholar
  5. 5.
    Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2625–2634 (2015)Google Scholar
  6. 6.
    Fang, H., et al.: From captions to visual concepts and back (2015)Google Scholar
  7. 7.
    Gan, Z.: Semantic compositional nets (2017). https://github.com/zhegan27/Semantic_Compositional_Nets
  8. 8.
    Gan, Z., et al.: Semantic compositional networks for visual captioning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)Google Scholar
  9. 9.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  11. 11.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  12. 12.
    Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3128–3137 (2015)Google Scholar
  13. 13.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  14. 14.
    Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)Google Scholar
  15. 15.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  16. 16.
    Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 6 (2017)Google Scholar
  17. 17.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of 27th International Conference on Machine Learning (ICML), pp. 807–814 (2010)Google Scholar
  18. 18.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
  19. 19.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar
  21. 21.
    Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4566–4575 (2015)Google Scholar
  22. 22.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164. IEEE (2015)Google Scholar
  23. 23.
    Wu, Q., Shen, C., Liu, L., Dick, A., van den Hengel, A.: What value do explicit high level concepts have in vision to language problems? In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 203–212 (2016)Google Scholar
  24. 24.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)Google Scholar
  25. 25.
    You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651–4659 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Boeun Kim
    • 1
  • Young Han Lee
    • 1
  • Hyedong Jung
    • 1
  • Choongsang Cho
    • 1
    Email author
  1. 1.AI Research CenterKorea Electronics Technology InstituteSeongnamKorea

Personalised recommendations