Visual Keywords: from Text Retrieval to Multimedia Retrieval

  • Joo-Hwee Lim
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 50)


Despite the simplicity of keyword-based matching, text retrieval sys-tems have achieved practical success in recent decades. Keywords, which exhibit meaningful semantics to users, can be extracted relatively easily from text docu-ments. In the case of visual contents which are perceptual in nature, the definition of corresponding “keywords” and automatic extraction are unclear and non-trivial. Is there a similar metaphor or mechanism for visual data? In this chapter, we propose a new notion of visual keywords which are abstracted and extracted from exem-plary visual tokens tokenized from visual documents in a visual content domain by soft computing techniques. Each visual keyword is represented as a neural network or a soft cluster center. A visual content is indexed by comparing its visual tokens against the learned visual keywords of which the soft presence of comparison are aggregated spatially via contextual domain knowledge. A coding scheme based on singular value decomposition, similar to latent semantic indexing for text retrieval, is also proposed to reduce dimensionality and noise. An empirical study on profes-sional natural scene photograph retrieval and categorization will be described to show the effectiveness and efficiency of visual keywords.


Receptive Field Spatial Configuration Latent Semantic Analysis Visual Data Visual Content 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arbib, M.A. (Ed.) (1995). The Handbook of Brain Theory and Neural Networks. The MIT Press.Google Scholar
  2. 2.
    Bach, J.R. et al. (1996). Virage image search engine: an open framework for image management. In Storage and Retrieval for Image and Video Databases IV, Proc. SPIE 2670, pp. 76–87.CrossRefGoogle Scholar
  3. 3.
    Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York.MATHCrossRefGoogle Scholar
  4. 4.
    Bishop, CM. (1995). Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar
  5. 5.
    Bolle, R.M., Yeo, B.L., & Yeung, M.M. (1998). Video query: research directions. IBM Journal of Research and Development, 42(2), pp. 233–252, March 1998.CrossRefGoogle Scholar
  6. 6.
    Carson, C. et al. (1999). Color- and texture-based image segmentation using EM and its application to image query and Classification. Submitted to IEEE Tran. PA ML Google Scholar
  7. 7.
    Deerwester, S. et al. (1990). Indexing by latent semantic analysis. J. of the Am. Soc. for Information Science, 41, pp. 391–407.CrossRefGoogle Scholar
  8. 8.
    Huang, J., Kumar, S.R., & Zabih, R. (1998). An automatic hierarchical image Classification scheme. In Proc. of ACM Multimedia’98, pp. 219–228.Google Scholar
  9. 9.
    Kohonen, T. (1997). Self-Organizing Maps (2 nded.). Springer.MATHCrossRefGoogle Scholar
  10. 10.
    Landauer, T.K., Laham, D., & Foltz, P. (1992). Learning human-like knowledge by singular value decomposition: a progress report. In M. I. Jordan, M. J. Kearns & S. A. Solla (Eds.), Advances in Neural Information Processing Systems 10, pp. 45–51. Cambridge: MIT Press.Google Scholar
  11. 11.
    Larkey, L.S. & Croft, W.B. (1996). Combining classifiers in text categorization. In Proc. Of SIGIR’96, pp. 289–297.Google Scholar
  12. 12.
    Lewis, D.D. & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Proc. of SIGIR’94, pp. 81–93.Google Scholar
  13. 13.
    Lewis, D.D. (1995). Evaluating and optimizing autonomous text Classification Systems. In Proc. of SIGIR’95, pp. 246–254.Google Scholar
  14. 14.
    Lim, J.H., Wu, J.K., Singh, S., Narasimhalu, A.D. (1998). Learning for content-based multimedia retrieval. In Proc. of Infi. Forum of Multimedia and Image Processing, Anchorage, Alaska, USA, May 10–14, 1998, pp. 074.1–074.8.Google Scholar
  15. 15.
    Lipson, P., Grimson, E., & Sinha, P. (1997). Configuration based scene Classification and image indexing. In Proc. of CVPR’97, pp. 1007–1013.Google Scholar
  16. 16.
    Mitchell, T.M. (1997). Machine Learning. McGraw-Hill.MATHGoogle Scholar
  17. 17.
    Niblack, W. et al. (1993). The QBIC project: querying images by content using color, textures and shapes. Storage and Retrieval for Image and Video Databases, Proc. SPIE 1908, pp. 13–25.Google Scholar
  18. 18.
    Papageorgiou, P.C., Oren, M., Poggio, T.: A general framework for object detection. In Proc. ICCV, pp. 555–562.Google Scholar
  19. 19.
    Pentland, A., Picard, R.W., Sclaroff, S. (1995). Photobook: content-based manipulation of image databases. Intl. J. of Computer Vision, 18(3): 233–254.CrossRefGoogle Scholar
  20. 20.
    Plate, T. (1995). Holographie reduced representations. IEEE Trans, on Neural Networks, 6(3), 623–641.CrossRefGoogle Scholar
  21. 21.
    Ratan, A.L. & Grimson, W.E.L. (1997). Training templates for scene Classification using afew examples. In Proc. IEEE Workshop on Content-Based Analysis of Images and Video Libraries, pp. 90–97.CrossRefGoogle Scholar
  22. 22.
    Robertson, S.E. & Sparck Jones, K. (1976). Relevance weighting of search terms. J. of the Am. Soc. for Info. Sc, 27, 129–146.CrossRefGoogle Scholar
  23. 23.
    Rowe, L.A., Boreczky, J.S., & Eads, CA. (1994). Indices for user access to large video database. Storage and Retrieval for Image and Video Databases IL Proc. SPLE 2185, pp. 150–161.Google Scholar
  24. 24.
    Salton, G. (Ed.) (1971). The SMART System - Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  25. 25.
    Smith, J.R. & Chang, S.-F. (1996). Visualseek: a fully automated contentbased image query system. In Proc. ACM Multimedia 96, Boston, MA, November 20, 1996.Google Scholar
  26. 26.
    Sparck Jones, K. & Willett, P. (Eds.) (1997). Readings in Information retrieval Morgan Kaufmann Publishers, Inc.Google Scholar
  27. 27.
    Unser, M. (1995). Texture Classification and segmentation using wavelet frames. IEEE Trans, on Image Proc, 4(11): 1549–1560.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Joo-Hwee Lim
    • 1
  1. 1.RWCP Information-Base Functions KRDL LabKent Ridge Digital LabsSingapore

Personalised recommendations