Skip to main content

Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10133))

Included in the following conference series:

Abstract

Video hyperlinking is the process of creating links within a collection of videos to help navigation and information seeking. Starting from a given set of video segments, called anchors, a set of related segments, called targets, must be provided. In past years, a number of content-based approaches have been proposed with good results obtained by searching for target segments that are very similar to the anchor in terms of content and information. Unfortunately, relevance has been obtained to the expense of diversity. In this paper, we study multimodal approaches and their ability to provide a set of diverse yet relevant targets. We compare two recently introduced cross-modal approaches, namely, deep auto-encoders and bimodal LDA, and experimentally show that both provide significantly more diverse targets than a state-of-the-art baseline. Bimodal autoencoders offer the best trade-off between relevance and diversity, with bimodal LDA exhibiting slightly more diverse targets at a lower precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/Lasagne/Lasagne.

References

  1. Barrios, J.M., Saavedra, J.M., Ramirez, F., Contreras, D.: ORAND at TRECVID 2015: instance search and video hyperlinking tasks. In: Proceedings of TRECVID (2015)

    Google Scholar 

  2. Bhatt, C., Pappas, N., Habibi, M., Popescu-Belis, A.: Idiap at MediaEval 2013: search and hyperlinking task. In: Proceedings of the MediaEval Workshop (2013)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Bois, R., Şimon, A.-R., Sicre, R., Gravier, G., Sébillot, P.: IRISA at TrecVid2015 2015: leveraging multimodal LDA for video hyperlinking. In: Proceedings of TRECVID (2015)

    Google Scholar 

  5. Campr, M., Ježek, K.: Comparing semantic models for evaluating automatic document summarization. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 252–260. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_29

    Chapter  Google Scholar 

  6. Cheng, Z., Li, X., Shen, J., Hauptmann, A.G.: CMU-SMU@TRECVID 2015: video hyperlinking. In: Proceedings of TRECVID (2015)

    Google Scholar 

  7. De Nies, T., De Neve, W., Mannens, E., Van de Walle, R.: Ghent University-iMinds at MediaEval 2013: an unsupervised named entity-based similarity measure for search and hyperlinking. In: Proceedings of the MediaEval Workshop (2013)

    Google Scholar 

  8. Eskevich, M., Aly, R., Racca, D.N., Ordelman, R., Chen, S., Jones G.J.F.: The search and hyperlinking task at MediaEval 2014. In: Proceedings of the MediaEval Workshop (2014)

    Google Scholar 

  9. Eskevich, M., Jones, G.J., Chen, S., Aly, R., Ordelman, R., Nadeem, D., Guinaudeau, C., Gravier, G., Sébillot, P., Nies, T.D., Debevere, P., de Walle, R.V., Galušcáková, P., Pecina, P., Larson, M.: Multimedia information seeking through search and hyperlinking. In: ACM International Conference on Multimedia Retrieval (2013)

    Google Scholar 

  10. Eskevich, M., Larson, M., Aly, R., Sabetghadam, S., Jones, G.J.F., Ordelman, R., Huet, B.: Multimodal video-to-video linking: turning to the crowd for insight and evaluation. In: Proceedings of the 23rd International Conference on Multimedia Modeling (2017)

    Google Scholar 

  11. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: ACM International Conference on Multimedia, pp. 7–16 (2014)

    Google Scholar 

  12. Galuscáková, P., Krulis, M., Lokoc, J., Pecina, P.: CUNI at MediaEval 2014 search and hyperlinking task: visual and prosodic features in hyperlinking. In: Working Notes Proceedings of the MediaEval Workshop (2014)

    Google Scholar 

  13. Gauvain, J.-L., Lamel, L., Adda, G.: The LIMSI broadcast news transcription system. Speech commun. 37(1), 89–108 (2002)

    Article  MATH  Google Scholar 

  14. Guinaudeau, C., Gravier, G., Sébillot, P.: IRISA at MediaEval 2012: search and hyperlinking task. In: Working Notes Proceedings of the MediaEval Workshop (2012)

    Google Scholar 

  15. Hasan, K.S., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics (2010)

    Google Scholar 

  16. Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1

    Google Scholar 

  17. Le, H.A., Bui, Q., Huet, B., et al.: LinkedTV at MediaEval 2014 search and hyperlinking task. In: Proceedings of the MediaEval Workshop (2014)

    Google Scholar 

  18. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of International Conference on Machine Learning

    Google Scholar 

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  20. Over, P., Awad, G., Michel, M., Fiscus, J., Kraaij, W., Smeaton, A.F., Quénot, G., Ordelman, R.: TRECVID 2015 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID (2015)

    Google Scholar 

  21. Pang, L., Ngo, C.-W.: VIREO @ TRECVID 2015: video hyperlinking. In: Proceedings of TRECVID (2015)

    Google Scholar 

  22. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)

    MATH  Google Scholar 

  23. Simon, A.-R.: Semantic structuring of video collections from speech: segmentation and hyperlinking. Ph.D. thesis, Université de Rennes 1 (2015)

    Google Scholar 

  24. Smet, W.D., Moens, M.: Cross-language linking of news stories on the web using interlingual topic modelling. In: ACM Workshop on Social Web Search and Mining (2009)

    Google Scholar 

  25. Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)

    Google Scholar 

  26. Tommasi, T., Aly, R.B.N., McGuinness, K., Chatfield, K., et al.: Beyond metadata: searching your archive based on its audio-visual content. In: Proceedings of the International Broadcasting Convention (2014)

    Google Scholar 

  27. Vukotić, V., Raymond, C., Gravier, G.: Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications. In: Proceedings of the ACM International Conference on Multimedia Retrieval (2016)

    Google Scholar 

  28. Vukotic, V., Raymond, C., Gravier, G.: Multimodal and crossmodal representation learning from textual and visual features with bidirectional deep neural networks for video hyperlinking. In: ACM Multimedia 2016 Workshop: Vision and Language Integration Meets Multimedia Fusion (iV&L-MM 2016), Amsterdam, Netherlands. ACM, October 2016

    Google Scholar 

  29. Vulić, I., Moens, M.-F.: Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

    Google Scholar 

  30. Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Mach. Learn. 81(1), 21–35 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rémi Bois .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bois, R. et al. (2017). Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10133. Springer, Cham. https://doi.org/10.1007/978-3-319-51814-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51814-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51813-8

  • Online ISBN: 978-3-319-51814-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics