Cluster Computing

, Volume 22, Supplement 4, pp 8939–8951 | Cite as

An optimization framework of video advertising: using deep learning algorithm based on global image information

  • Cheng Luo
  • Ying Peng
  • Tingting Zhu
  • Ling LiEmail author


Object detection and recognition technology plays an important role in video advertising. Based on the deep learning algorithm, this paper first builds a real-time object detection model (Yes-Net) based on image global information which combines the advantages of CNN and RNN algorithm together and has achieved significant effects compared with other object detection algorithms. We used this model to build an accurate video advertising framework and applied the framework in an IPTV platform of a telecommunication operator to collect comparative data. We first performed image detection and recognition in video playing, and then associated the recognition results with the ads database to conduct real-time classification and accurate ads delivery. It proves that compared with traditional advertisement delivery method, this new framework dramatically improves the click-through rate, and achieves better results.


Video advertising Real-time object detection Deep learning Global image information Accurate ads delivery 



Thanks for the great support from Chengdu Handsight Information Technology Co., Ltd. The company offers all the empirical data, as well as the using authorization of IPTV supporting system. Our work is supported by Key support projects of Sichuan science and Technology Department (No.18ZDYF1707), Key support projects of Sichuan Federation of Social Science Associations (No. SC16XK033) and Key support projects of Sichuan Tourism University (No. SCTUJ1709).


  1. 1.
    Affonso, C., Rossi, A., Vieira, F., de Carvalho, A.: Deep learning for biological image classification. Expert Syst. Appl 85, 114–122 (2017)CrossRefGoogle Scholar
  2. 2.
    Bao, L., Le, D.-N., Nguyen, G.N., Bhateja, V., Satapathy, S.C.: Optimizing feature selection in video-based recognition using Max-Min Ant System for the online video contextual advertisement user-oriented system. J. Comput. Sci. 21, 361–370 (2017)CrossRefGoogle Scholar
  3. 3.
    Bourbakis, N., Esposito, A., Kavraki, D.: Extracting and associating meta-features for understanding people’s emotional behaviour: face and speech. Cognit. Comput. 3(3), 436–448 (2011)CrossRefGoogle Scholar
  4. 4.
    Castaldo, F., Palmieri, F.A., Regazzoni, C.S.: Bayesian analysis of behaviors and interactions for situation awareness in transportation systems. IEEE Trans. Intell. Transp. Syst. 17(2), 313–322 (2016)CrossRefGoogle Scholar
  5. 5.
    Chan, T.-H., Jai, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: PCANet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen, X.Y., Xiang, S.M., Liu, C.L., Pan, C.H.: Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 11(10), 1797–1801 (2014)CrossRefGoogle Scholar
  7. 7.
    Dong, C., He, K.M., Tang, X.O.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)CrossRefGoogle Scholar
  8. 8.
    Duffner, S., Garcia, C.: Visual focus of attention estimation with unsupervised incremental learning. IEEE Trans. Circuits Syst. Video Technol. 26(12), 2264–2272 (2016)CrossRefGoogle Scholar
  9. 9.
    Eidinger, E., Enbar, R., Hassner, T.: Age and gender estimation of unfiltered faces. IEEE Trans. Inf. Forensics Secur. 9(12), 2170–2179 (2014)CrossRefGoogle Scholar
  10. 10.
    Farabet, C., Couprie, C., Najman, L., Lecun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915 (2013)CrossRefGoogle Scholar
  11. 11.
    Geurin-Eagleman, A.N., Burch, L.M.: Communicating via photographs: a gendered analysis of olympic athletes’ visual self-presentation on instagram. Sport Manag. Rev. 19(2), 133–145 (2016)CrossRefGoogle Scholar
  12. 12.
    Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  13. 13.
    Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. In: 2015 IEEE Conference on CVPR, pp. 437–446 (2015)Google Scholar
  14. 14.
    Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  15. 15.
    Greene, M.R., Baldassano, C., Esteva, A., Beck, D.M., Li, F.F., Gauthier, I.: Visual scenes are categorized by function. J. Exp. Psychol. 145(1), 82–94 (2016)CrossRefGoogle Scholar
  16. 16.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–80 (1997)CrossRefGoogle Scholar
  17. 17.
    Hou, S.J., Chen, L., Tao, D.C., Zhou, S.B., Liu, W.J., Zheng, Y.J.: Multi-layer multi-view topic model for classifying advertising video. Pattern Recognit. 68, 66–81 (2017)CrossRefGoogle Scholar
  18. 18.
    Hsieh, L.C., Wu, G.L., Hsu, Y.M., Hsu, W.: Online image search result grouping with MapReduce-based image clustering and graph construction for large-scale photos. J. Vis. Commun. Image R. 25(2), 384–395 (2014)CrossRefGoogle Scholar
  19. 19.
    Huang, F.L., Zhang, S.C., Zhang, J.L., Yu, G.: Multimodal learning for topic sentiment analysis in microblogging. Neurocomputing 253, 144–153 (2017)CrossRefGoogle Scholar
  20. 20.
    Ji, S.W., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  21. 21.
    Jiang, Y., Wang, Y.J.: Psychologically inspired visual information storage and retrieval modeling for multiclass image classification. Neurocomputing 259, 194–200 (2017)CrossRefGoogle Scholar
  22. 22.
    Jiji, G., Durai Raj, P.: Content-based image retrieval in dermatology using intelligent technique. IET Image Process. 9(4), 306–317 (2015)CrossRefGoogle Scholar
  23. 23.
    Kaneko, T., Yanai, K.: Event photo mining from Twitter using keyword bursts and image clustering. Neurocomputing 172, 143–158 (2016)CrossRefGoogle Scholar
  24. 24.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)CrossRefGoogle Scholar
  25. 25.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.: Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun. ACM 54(10), 95–103 (2011)CrossRefGoogle Scholar
  26. 26.
    Li, K., Zou, C.Q., Bu, S.H., Liang, Y., Zhang, J., Gong, M.L.: Multi-modal feature fusion for geographic image annotation. Pattern Recognit. 73, 1–14 (2018)CrossRefGoogle Scholar
  27. 27.
    Li, R.F., Feng, F.X., Ahmad, I., Wang, X.J.: Retrieving real world clothing images via multi-weight deep convolutional neural networks. Cluster Comput. (2017).
  28. 28.
    Muhammad, A., Tamleek, A.T., Shakirullah S., Muhammad, A., Muhammad, S.: DeepSIC: a deep model for satellite image classification. Cluster Comput. (2017).
  29. 29.
    Murugappan, V., Sabeenian, R.S.: Texture based medical image classification by using multi-scale gabor rotation-invariant local binary pattern (MGRLBP). Clust. Comput. 1–14(2017)Google Scholar
  30. 30.
    Panteras, G., Wise, S., Lu, X., Croitoru, A., Crooks, A., Stefanidis, A.: Triangulating social multimedia content for event localization using Flickr and Twitter. Trans. GIS 19(5), 694–715 (2015)CrossRefGoogle Scholar
  31. 31.
    Qawaqneh, Z., Mallouh, A.A., Barkana, B.D.: Age and gender classification from speech and face images by jointly fine-tuned deep neural networks. Expert Syst. Appl. 85, 76–86 (2017)CrossRefGoogle Scholar
  32. 32.
    Qayyum, A., Anwar, S., Awais, M., Majid, M.: Medical image retrieval using deep convolutional neural network. Neurocomputing 266, 8–20 (2017)CrossRefGoogle Scholar
  33. 33.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on CVPR, pp. 779–788 (2016)Google Scholar
  34. 34.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  35. 35.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A., Li, F.F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Shin, H.-C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imag. 35(5), 1285–1298 (2016)Google Scholar
  37. 37.
    Shin, H.-C., Orton, M.R., Collins, D.J., Doran, S.J., Leach, M.O.: Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1930–1943 (2013)Google Scholar
  38. 38.
    Tang, J.X., Deng, C.W., Huang, G.B., Zhao, B.J.: Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine. IEEE Trans. Geosci. Remote Sens. 53(3), 1174–1185 (2015)CrossRefGoogle Scholar
  39. 39.
    Troya-Galvis, A., Gançarski, P., Berti-Équille, L.: Remote sensing image analysis by aggregation of segmentation-classification collaborative agents. Pattern Recognit. 73, 259–274 (2018)CrossRefGoogle Scholar
  40. 40.
    Uyar, A., Karapinar, R.: Investigating the precision of Web image search engines for popular and less popular entities. J. Inf. Sci. 43(3), 378–392 (2017)CrossRefGoogle Scholar
  41. 41.
    Wu, L., Wang, Y., Gao, J.B., Li, X.: Deep adaptive feature embedding with local sample distributions for person re-identification. Pattern Recognit. 73, 275–288 (2018)CrossRefGoogle Scholar
  42. 42.
    Yann, L.C., Yoshua, B., Geoffrey, H.: Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  43. 43.
    Zhang, H.J., Wang, S., Cao, X., Yue, H., Wang, K.: Learning to link human objects in videos and advertisements with clothes retrieval. In: 2016 International Joint Conference on Neural Networks. pp. 5006–5013 (2016)Google Scholar
  44. 44.
    Zhang, W.L., Li, R.J., Deng, H.T., Wang, L., Lin, W.L., Ji, S.W., Shen, D.G.: Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage 108, 214–224 (2015)CrossRefGoogle Scholar
  45. 45.
    Zhang, H.J., Cao, X., Ho, J.K.L., Chow, S.: Object-level video advertising: an optimization framework. IEEE Trans. Ind. Inform. 13(2), 520–531 (2017)CrossRefGoogle Scholar
  46. 46.
    Zhang, J.M., Ma, S.G., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., Shen, X.H., Price, B., Měch, R.: Salient object subitizing. Int. J. Comput. Vis. 124(2), 169–186 (2017)MathSciNetCrossRefGoogle Scholar
  47. 47.
    Zhou, X.Y., Gong, W., Fu, W.L., Du, F.T.: Application of deep learning in object detection. In: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science. pp. 631–634 (2017)Google Scholar
  48. 48.
    Zhuo, T.: Face recognition from a single image per person using deep architecture neural networks. Clust. Comput. 19(1), 73–77 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Business School of Sichuan UniversityChengduChina
  2. 2.Digital Media Art DepartmentSichuan Film and Television UniversityChengduChina
  3. 3.Research Center for Smart Tourism Technology Application and InnovationSichuan Tourism UniversityChengduChina

Personalised recommendations