Intermediate Semantics Based Distance Metric Learning for Video Annotation and Similarity Measurements

Qu, Wen; Zhou, Xiangmin; Wang, Daling; Feng, Shi; Zhang, Yifei; Yu, Ge

doi:10.1007/978-3-319-48740-3_16

Wen Qu¹⁹,
Xiangmin Zhou²⁰,
Daling Wang¹⁹,
Shi Feng¹⁹,
Yifei Zhang¹⁹ &
…
Ge Yu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10041))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1248 Accesses

Abstract

The similarity metric between videos is integral to several key tasks, including video retrieval, classification and recommendation. Since there is no standard criterion for the similarity measurement between videos except measuring manually, it is difficult to collect large training dataset for distance metric learning algorithms. Moreover, the existing distance metric learning (DML) methods for multimedia data suffer from two critical limitations: (1) they typically attempt to learn a distance function on the single label setting, in which each item is only labeled with single label; (2) they are often designed for learning distance metrics on low-level features, which ignore the semantic similarity of the multimedia data. To address these problems, in this paper, we propose a novel framework of Intermediate Semantics based Distance Learning (ISDL) for video clips, which aims to integrate semantics of multiple modals optimally for distance metric learning. In particular, the proposed framework: (1) generates the training pairs automatically; (2) defines multi-modal concepts for similarity measure among videos; (3) learns the distance metric for video clips based on the intermediate semantics. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms, and the results validate the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: Proceedings in Conference on Machine Learning, pp. 11–18 (2003)
Google Scholar
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information theoretical metric learning. In: Proceedings in Conference on Machine Learning, pp. 209–216 (2007)
Google Scholar
Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Neural Information Processing Systems, pp. 451–458 (2005)
Google Scholar
Giannakopoulos, T., Pikrakis, A., Theodoridis, S.: A multi-class audio classification method with respect to violent content in movies, using Bayesian networks. In: IEEE International Workshop on Multimedia Signal Processing, pp. 90–93 (2007)
Google Scholar
Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood components analysis. In: Neural Information Processing Systems, pp. 513–520 (2004)
Google Scholar
Hauptmann, A.G., Yan, R., Lin, W.H., Christel, M., Wactlar, H.: Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans. Multimed. 9(5), 958–966 (2007)
Article Google Scholar
Hoi, S.C.H., Liu, W., Lyu, M.R., Ma, W.Y.: Learning distance metrics with contextual constraints for image retrieval. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2072–2078 (2006)
Google Scholar
Jiang, Y.G., Ngo, C.W., Yang, J.: Toward optimal bag-of-features for object categorization and semantic video retrieval. In: ACM International Conference on Image Video Retrieval, pp. 494–501 (2007)
Google Scholar
Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2012)
Article MathSciNet MATH Google Scholar
Laptev, I.: On space-time interest points. IJCV 6(2/3), 107–123 (2005)
Article MathSciNet Google Scholar
Lin, C.Y., Tseng, B.L., Smith, J.R.: Video collaborative annotation forum: establishing ground-truth labels on large multimedia datasets. In: Proceedings of the TRECVID Workshop (2003)
Google Scholar
Lowe, D.: Distinctive image features from scale invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Ma, Z., Hauptann, A.G., Yang, Y., Sebe, N.: Classifier-specific intermediate representation for multimedia tasks. In: ICMR, p. 50. ACM press, Hong Kong (2012)
Google Scholar
Marszalek, M., Laptev, I.: Actions in context. In: CVPR, pp. 2929–2936. IEEE press (2009)
Google Scholar
McFee, B., Lanckriet, G.R.G.: Learning multi-modal similarity. J. Mach. Learn. Res. 12, 491–523 (2011)
MathSciNet MATH Google Scholar
Mei, T., Yang, B., Hua, X.S., Li, S.: Contextual video recommendation by multimodal relevance and user feedback. ACM Trans. Inf. Syst. 29(2), 10 (2011)
Article Google Scholar
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)
Article Google Scholar
Naphade, M.R., Smith, J.R.: Large-scale concept ontology for multimedia. IEEE MultiMed. 13(3), 86–91 (2006)
Article Google Scholar
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: ACM MultiMedia, pp. 17–26 (2007)
Google Scholar
Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 416–426. Springer, Heidelberg (2001)
Chapter Google Scholar
Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: NIPS, pp. 41–48 (2003)
Google Scholar
Snoek, C., Worring, M., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM MultiMedia, pp. 421–430 (2007)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL, pp. 173–180 (2003)
Google Scholar
TREC video retrieval evaluation. http://www-nlpir.nist.gov/projects/trecvid
Wang, M., Hua, X.: Study on the combination of video concept detectors. In: ACM MultiMedia, pp. 647–650 (2008)
Google Scholar
Wang, Y., Lin, X., Zhang, Q.: Towards metric fusion on multi-view data: a cross-view based graph random walk approach. In: CIKM, pp. 805–810. ACM press, San Francisco (2013)
Google Scholar
Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: NIPS, pp. 1473–1480 (2006)
Google Scholar
Wu, P., Hoi, S.C.H., Xia, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: ACM MultiMedia, pp. 153–162 (2008)
Google Scholar
Xia, H., Wu, P., Hoi, S.C.H.: Online multi-modal distance learning for scalable multimedia retrieval. In: WSDM, pp. 455–464. ACM press, Rome (2013)
Google Scholar
Yang, L., Jin, R.: Distance Metric Learning: A Comprehensive Survey. Michigan State University (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, People’s Republic of China
Wen Qu, Daling Wang, Shi Feng, Yifei Zhang & Ge Yu
School of Science, RMIT University, Melbourne, Australia
Xiangmin Zhou

Authors

Wen Qu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Daling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Qu .

Editor information

Editors and Affiliations

Poznań University of Economics, Poznan, Poland
Wojciech Cellary
University of Minnesota, Minneapolis, Minnesota, USA
Mohamed F. Mokbel
Tsinghua University, Beijing, China
Jianmin Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Victoria University, Melbourne, Victoria, Australia
Rui Zhou
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, W., Zhou, X., Wang, D., Feng, S., Zhang, Y., Yu, G. (2016). Intermediate Semantics Based Distance Metric Learning for Video Annotation and Similarity Measurements. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10041. Springer, Cham. https://doi.org/10.1007/978-3-319-48740-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-48740-3_16
Published: 02 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48739-7
Online ISBN: 978-3-319-48740-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics