Skip to main content

Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval

  • Conference paper
Book cover PRICAI 2010: Trends in Artificial Intelligence (PRICAI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6230))

Included in the following conference series:

Abstract

Conventional methods for multimodal data retrieval use text-tag based or cross-modal approaches such as tag-image co-occurrence and canonical correlation analysis. Since there are differences of granularity in text and image features, however, approaches based on lower-order relationship between modalities may have limitations. Here, we propose a novel text and image keyword generation method by cross-modal associative learning and inference with multimodal queries. We use a modified hypernetwork model, i.e. layered hypernetworks (LHNs) which consists of the first (lower) layer and the second (upper) layer which has more than two modality-dependent hypernetworks and one modality-integrating hypernetwork, respectively. LHNs learn higher-order associative relationships between text and image modalities by training on an example set. After training, LHNs are used to extend multimodal queries by generating text and image keywords via cross-modal inference, i.e. text-to-image and image-to-text. The LHNs are evaluated on Korean magazine articles with images on women fashions and life-style. Experimental results show that the proposed method generates vision-language cross-modal keywords with high accuracy. The results also show that multimodal queries improve the accuracy of keyword generation compared with uni-modal ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys (CSUR), Article 5, 40(2) (2008)

    Google Scholar 

  2. Goh, K.-S., Chang, E.Y., Lai, W.-C.: Multimodal concept-dependent active learning for image retrieval. In: Proc. of the 12th Annual ACM International Conference on Multimedia (MM 2004), pp. 564–571 (2004)

    Google Scholar 

  3. Simon, I., Snavely, N., Seitz, S.M.: Scene Summarization for Online Image Collections. In: Proc. of 11th IEEE International Conference on Computer Vision, ICCV 2007 (2007)

    Google Scholar 

  4. Ha, J.-W., Kim, B.-H., Kim, H.-W., Yoon, W.C., Eom, J.-H., Zhang, B.-T.: Text-to-image cross-modal retrieval of magazine articles based on higher-order pattern recall by hypernetworks. In: Proc. of the 10th International Symposium on Advanced Intelligent Systems (ISIS 2009), pp. 274–277 (2009)

    Google Scholar 

  5. Zhang, B.-T.: Hypernetworks: A molecular evolutionary architecture for cognitive learning and memory. IEEE Computational Intelligence Magazine 3(3), 49–63 (2008)

    Article  Google Scholar 

  6. Fuster, J.M., Bodner, M., Kroger, J.K.: Cross-modal and cross-temporal association in neurons of frontal cortex. Nature 405, 347–351 (2000)

    Article  Google Scholar 

  7. Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Foundations and Trends in Information Retrieval 2(4), 215–322 (2009)

    Article  Google Scholar 

  8. Yan, R., Hauptmann, A.G.: A review of text and image retrieval approaches for broadcast news video. Information Retrieval 10(4-5), 445–484 (2007)

    Article  Google Scholar 

  9. Li, D., Dimitrova, N., Li, M., Sethi, K.: Multimedia content processing through cross-modal association. In: Proc. of the 11th Annual ACM International Conference on Multimedia (MM 2003), pp. 604–611 (2003)

    Google Scholar 

  10. Ferecatu, M., Boujemaa, N., Crucianu, M.: Semantic interactive image retrieval combining visual and conceptual content description. Multimedia Systems 13, 309–322 (2008)

    Article  Google Scholar 

  11. Yakhnenko, O., Honavar, V.: Annotating images and image objects using a hierarchical dirichlet process model. In: Proc. of the 9th International Workshop on Multimedia Data Mining in ACM SIGKDD 2009, pp. 1–7 (2009)

    Google Scholar 

  12. Quek, F., McNeil, D., Bryll, R., Duncan, S., Ma, X.-F., Kirbas, C., McCullough, K.E., Ansari, R.: Multimodal human discourse: gesture and speech. ACM Trans. on Computer-Human Interaction 9(3), 171–193 (2002)

    Article  Google Scholar 

  13. Christoudias, C.M., Saenko, K., Morency, L.-P., Darrell, T.: Co-Adaptation of audio-visual speech and gesture classifiers. In: Proc. of the 8th International Conference on Multimodal Interfaces, pp. 84–91 (2006)

    Google Scholar 

  14. Bay, H., Tuytelaars, T., Gool, T.V.: Surf: Speed up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ha, JW., Kim, BH., Lee, B., Zhang, BT. (2010). Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval. In: Zhang, BT., Orgun, M.A. (eds) PRICAI 2010: Trends in Artificial Intelligence. PRICAI 2010. Lecture Notes in Computer Science(), vol 6230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15246-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15246-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15245-0

  • Online ISBN: 978-3-642-15246-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics