Skip to main content

Intermediate Semantics Based Distance Metric Learning for Video Annotation and Similarity Measurements

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2016 (WISE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10041))

Included in the following conference series:

  • 1248 Accesses

Abstract

The similarity metric between videos is integral to several key tasks, including video retrieval, classification and recommendation. Since there is no standard criterion for the similarity measurement between videos except measuring manually, it is difficult to collect large training dataset for distance metric learning algorithms. Moreover, the existing distance metric learning (DML) methods for multimedia data suffer from two critical limitations: (1) they typically attempt to learn a distance function on the single label setting, in which each item is only labeled with single label; (2) they are often designed for learning distance metrics on low-level features, which ignore the semantic similarity of the multimedia data. To address these problems, in this paper, we propose a novel framework of Intermediate Semantics based Distance Learning (ISDL) for video clips, which aims to integrate semantics of multiple modals optimally for distance metric learning. In particular, the proposed framework: (1) generates the training pairs automatically; (2) defines multi-modal concepts for similarity measure among videos; (3) learns the distance metric for video clips based on the intermediate semantics. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms, and the results validate the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: Proceedings in Conference on Machine Learning, pp. 11–18 (2003)

    Google Scholar 

  2. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information theoretical metric learning. In: Proceedings in Conference on Machine Learning, pp. 209–216 (2007)

    Google Scholar 

  3. Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Neural Information Processing Systems, pp. 451–458 (2005)

    Google Scholar 

  4. Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A multi-class audio classification method with respect to violent content in movies, using Bayesian networks. In: IEEE International Workshop on Multimedia Signal Processing, pp. 90–93 (2007)

    Google Scholar 

  5. Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. In: Neural Information Processing Systems, pp. 513–520 (2004)

    Google Scholar 

  6. Hauptmann, A.G., Yan, R., Lin, W.H., Christel, M., Wactlar, H.: Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans. Multimed. 9(5), 958–966 (2007)

    Article  Google Scholar 

  7. Hoi, S.C.H., Liu, W., Lyu, M.R., Ma, W.Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2072–2078 (2006)

    Google Scholar 

  8. Jiang, Y.G., Ngo, C.W., Yang, J.: Toward optimal bag-of-features for object categorization and semantic video retrieval. In: ACM International Conference on Image Video Retrieval, pp. 494–501 (2007)

    Google Scholar 

  9. Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Laptev, I.: On space-time interest points. IJCV 6(2/3), 107–123 (2005)

    Article  MathSciNet  Google Scholar 

  11. Lin, C.Y., Tseng, B.L., Smith, J.R.: Video collaborative annotation forum: establishing ground-truth labels on large multimedia datasets. In: Proceedings of the TRECVID Workshop (2003)

    Google Scholar 

  12. Lowe, D.: Distinctive image features from scale invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  13. Ma, Z., Hauptann, A.G., Yang, Y., Sebe, N.: Classifier-specific intermediate representation for multimedia tasks. In: ICMR, p. 50. ACM press, Hong Kong (2012)

    Google Scholar 

  14. Marszalek, M., Laptev, I.: Actions in context. In: CVPR, pp. 2929–2936. IEEE press (2009)

    Google Scholar 

  15. McFee, B., Lanckriet, G.R.G.: Learning multi-modal similarity. J. Mach. Learn. Res. 12, 491–523 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Mei, T., Yang, B., Hua, X.S., Li, S.: Contextual video recommendation by multimodal relevance and user feedback. ACM Trans. Inf. Syst. 29(2), 10 (2011)

    Article  Google Scholar 

  17. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)

    Article  Google Scholar 

  18. Naphade, M.R., Smith, J.R.: Large-scale concept ontology for multimedia. IEEE MultiMed. 13(3), 86–91 (2006)

    Article  Google Scholar 

  19. Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: ACM MultiMedia, pp. 17–26 (2007)

    Google Scholar 

  20. Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  21. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: NIPS, pp. 41–48 (2003)

    Google Scholar 

  22. Snoek, C., Worring, M., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM MultiMedia, pp. 421–430 (2007)

    Google Scholar 

  23. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 173–180 (2003)

    Google Scholar 

  24. TREC video retrieval evaluation. http://www-nlpir.nist.gov/projects/trecvid

  25. Wang, M., Hua, X.: Study on the combination of video concept detectors. In: ACM MultiMedia, pp. 647–650 (2008)

    Google Scholar 

  26. Wang, Y., Lin, X., Zhang, Q.: Towards metric fusion on multi-view data: a cross-view based graph random walk approach. In: CIKM, pp. 805–810. ACM press, San Francisco (2013)

    Google Scholar 

  27. Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: NIPS, pp. 1473–1480 (2006)

    Google Scholar 

  28. Wu, P., Hoi, S.C.H., Xia, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: ACM MultiMedia, pp. 153–162 (2008)

    Google Scholar 

  29. Xia, H., Wu, P., Hoi, S.C.H.: Online multi-modal distance learning for scalable multimedia retrieval. In: WSDM, pp. 455–464. ACM press, Rome (2013)

    Google Scholar 

  30. Yang, L., Jin, R.: Distance Metric Learning: A Comprehensive Survey. Michigan State University (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Qu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Qu, W., Zhou, X., Wang, D., Feng, S., Zhang, Y., Yu, G. (2016). Intermediate Semantics Based Distance Metric Learning for Video Annotation and Similarity Measurements. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48740-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48739-7

  • Online ISBN: 978-3-319-48740-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics