Known-Item Search in Video Databases with Textual Queries

  • Adam Blažek
  • David KuboňEmail author
  • Jakub Lokoč
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9939)


In this paper, we present two approaches for known-item search in video databases with textual queries. In the first approach, we require the database objects to be labeled with an arbitrary ImageNet classification model. During the search, the set of query words is expanded with synonyms and hypernyms until we encounter words present in the database which are consequently searched for. In the second approach, we delegate the query to an independent database such as Google Images and let the user pick a suitable result for query-by-example search. Furthermore, the effectiveness of the proposed approaches is evaluated in a user study.



This research was supported by Charles University in Prague Grant Agency – GAUK project no. 1134316. Furthermore, we are grateful to Mr. Jan Pavlovsky for his help with the user study.


  1. 1.
    Barthel, K.U., Hezel, N., Mackowiak, R.: Navigating a graph of scenes for exploring large video collections. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 418–423. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-27674-8_43 CrossRefGoogle Scholar
  2. 2.
    Donahue, J., Jia, Y., Vinyals, O., et al.: DeCAF: a deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531 (2013)Google Scholar
  3. 3.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books, Bradford (1998)zbMATHGoogle Scholar
  4. 4.
    Hürst, W., van de Werken, R., Hoet, M.: A storyboard-based interface for mobile video browsing. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8936, pp. 261–265. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-14442-9_25 Google Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012)Google Scholar
  6. 6.
    Kruliš, M., Lokoč, J., Skopal, T.: Efficient extraction of clustering-based feature signatures using GPU architectures. Multimedia Tools Appl. 1–33 (2015)Google Scholar
  7. 7.
    Kuboň, D., Blažek, A., Lokoč, J., Skopal, T.: Multi-sketch semantic video browser. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 406–411. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-27674-8_41 CrossRefGoogle Scholar
  8. 8.
    Lin, D., Fidler, S., Kong, C., Urtasun, R.: Visual semantic search: retrieving videos via complex textual queries. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2657–2664, June 2014Google Scholar
  9. 9.
    Moumtzidou, A., et al.: VERGE: an interactive search engine for browsing video collections. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds.) MMM 2014, Part II. LNCS, vol. 8326, pp. 411–414. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-04117-9_48 CrossRefGoogle Scholar
  10. 10.
    Park, D.K., Jeon, Y.S., Won, C.S.: Efficient use of local edge histogram descriptor. In: Proceedings of the 2000 ACM Workshops on Multimedia, MULTIMEDIA 2000, pp. 51–54. ACM, New York (2000)Google Scholar
  11. 11.
    Russakovsky, O., Deng, J., Hao, S., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Schoeffmann, K.: A user-centric media retrieval competition: the video browser showdown 2012–2014. IEEE Multimedia 21(4), 8–13 (2014)CrossRefGoogle Scholar
  13. 13.
    Schoeffmann, K., Hudelist, M.A., Huber, J.: Video interaction tools: a survey of recent work. ACM Comput. Surv. 48(1), 14:1–14:34 (2015)CrossRefGoogle Scholar
  14. 14.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 (2014)Google Scholar
  15. 15.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, MIR 2006, pp. 321–330, ACM, New York (2006)Google Scholar
  16. 16.
    Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. CoRR, abs/1409.4842 (2014)Google Scholar
  17. 17.
    Volkmer, T., Natsev, A.: Exploring automatic query refinement for text-based video retrieval. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 765–768, July 2006Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.SIRET Research Group, Faculty of Mathematics and Physics, Department of Software EngineeringCharles University in PraguePragueCzech Republic

Personalised recommendations