Advertisement

Information Retrieval

, Volume 10, Issue 4–5, pp 445–484 | Cite as

A review of text and image retrieval approaches for broadcast news video

  • Rong Yan
  • Alexander G. Hauptmann
Article

Abstract

The effectiveness of a video retrieval system largely depends on the choice of underlying text and image retrieval components. The unique properties of video collections (e.g., multiple sources, noisy features and temporal relations) suggest we examine the performance of these retrieval methods in such a multimodal environment, and identify the relative importance of the underlying retrieval components. In this paper, we review a variety of text/image retrieval approaches as well as their individual components in the context of broadcast news video. Numerous components of text/image retrieval have been discussed in detail, including retrieval models, text sources, temporal expansion methods, query expansion methods, image features, and similarity measures. For each component, we conduct a series of retrieval experiments on TRECVID video collections to identify their advantages and disadvantages. To provide a more complete coverage of video retrieval, we briefly discuss an emerging approach called concept-based video retrieval, and review strategies for combining multiple retrieval outputs.

Keywords

Video retrieval Text retrieval Image retrieval Concept-based retrieval Fusion Review 

Notes

Acknowledgements

This material is based upon work funded in part by the US Government. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the US Government. The authors thank anonymous reviewers for providing helpful comments to improve this article.

References

  1. Adcock, J., Girgensohn, A., Cooper, M., Liu, T., Wilcox, L., & Rieffel, E. (2004). FXPAL Experiments for TRECVID 2004. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  2. Amir, A., Hsu, W., Iyengar, G., Lin, C. Y., Naphade, M., Natsev, A., Neti, C., Nock, H. J., Smith, J. R., Tseng, B. L., Wu, Y., & Zhang, D. (2003). IBM research TRECVID-2003 video retrieval system. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  3. Antani, S., Kasturi, R., & Jain, R. (2002). A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video. Pattern Recognition, 4, 945–65.CrossRefGoogle Scholar
  4. Aslam, J. A., & Montague, M. (2001). Models for metasearch. In Proceedings of the 24th ACM SIGIR conference on Research and development in information retrieva (pp. 276–284). New Orleans, Louisiana.Google Scholar
  5. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley.Google Scholar
  6. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., & Jordan, M. (2002). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.CrossRefGoogle Scholar
  7. Buckley, C., & Walz, J. (1999). SMART in TREC 8. In Proceedings of the 8th Text REtrieval Conference (TREC), Gaithersburg, MD.Google Scholar
  8. Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 955–974.CrossRefGoogle Scholar
  9. Carson, C., Belongie, S., Greenspan, H., & Malik, J. (1997). Region-based image querying. In Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL ’97) (pp. 42–49). San Juan, Puerto Rico.Google Scholar
  10. Chang, S. F., Hsu, W., Kennedy, L., Xie, L., Yanagawa, A., Zavesky, E., & Zhang, D. (2005a). Columbia university TRECVID-2005 video search and high-level feature extraction. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  11. Chang, S. F., Manmatha, R., & Chua, T. S. (2005b). Combining text and audio-visual features in video indexing. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, PA.Google Scholar
  12. Chen, D., & Odobez, J. M. (2005). Video text recognition using sequential monte carlo and error voting methods. Pattern Recognition Letter, 26(9), 1386–1403.CrossRefGoogle Scholar
  13. Christel, M., & Hauptmann, A. G. (2005). The use and utility of high-level semantic features. In Proceedings of International Conference on Image and Video Retrieval (CIVR), Singapore.Google Scholar
  14. Christel, M., & Martin, D. (1998). Information visualization within a digital video library. Journal of Intelligent Information Systems, 11(3), 235–257.CrossRefGoogle Scholar
  15. Chua, T. S., Tan, K. L., & Ooi, B. C. (1997). Fast signature-based color-spatial image retrieval. In Proceedings of the 1997 International Conference on Multimedia Computing and Systems (ICMCS’97), IEEE Computer Society (pp. 362–369). Washington, DC, USA .Google Scholar
  16. Chua, T. S., Neo, S. Y., Li, K., Wang, G. H., Shi, R., Zhao, M., Xu, H., Gao, S., & Nwe, T. L. (2004). TRECVID 2004 search and feature extraction task by NUS PRIS. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  17. Chua, T. S., Neo, S. Y., Goh, H. K., Zhao, M., Xiaom Y., & Wang, G. (2005). TRECVID 2005 by NUS PRIS. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  18. Chuang, C. H., & Kuo, C. C. (1996). Wavelet descriptor of planar curves: Theory and applications. IEEE Transactions on Image Processing, 5(1), 56–70.CrossRefGoogle Scholar
  19. Cooke, E., Ferguson, P., Gaughan, G., Gurrin, C., Jones, G., Borgue, H. L., Lee, H., Marlow, S., McDonald, K., McHugh, M., Murphy, N., O’Connor, N., O’Hare, N., Rothwell, S., Smeaton, A., & Wilkins, P. (2004). TRECVID 2004 experiments in Dublin City University. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  20. Cox, I. J., Rao, S. B., & Zhong, Y. (1996). Ratio regions: A technique for image segmentation. In Proceedings of International Conference on Pattern Recognition, vol 2 (pp. 557–564).Google Scholar
  21. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6).Google Scholar
  22. Del Bimbo, A. (2001). Visual Information Retrieval. Morgan Kaufmann Publishers.Google Scholar
  23. Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., & Equitz, W. (1994) Efficient and effective querying by image content. Journal of Intelligent Information Systems, 3(3/4), 231–262.CrossRefGoogle Scholar
  24. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.MATHGoogle Scholar
  25. Foley, C., Gurrin. C,, Jones, G. Lee, H., McGivney, S., O’Connor, N. E., Sav, S., Smeaton, A. F., & Wilkins, P. (2005). TRECVID 2005 experiments in Dublin City University. In Proceedings of NIST TREC Video Retrieval Evaluation, MD: Gaithersburg.Google Scholar
  26. Gaughan, G., Smeaton, A. F., Gurrin, C., Lee, H., & McDonald, K. (2003). Design, implementation and testing of an interactive video retrieval system. In Proceedings of the 11th ACM Multimedia Workshop on Multimedia Information Retrieval (pp. 23 – 30). Berkeley, CA .Google Scholar
  27. Gauvain, J., Lamel, L., & Adda, G. (2002) The LIMSI broadcast news transcription system. Speech Communication, 37(1–2), 89–108.MATHCrossRefGoogle Scholar
  28. Hauptmann, A., Chen, M. Y., Christel, M., Huang, C., Lin, W. H., Ng, T., Papernick, N., Velivelli, A., Yang, J., Yan, R., Yang, H., & Wactlar, H. D. (2004). Confounded Expectations: Informedia at TRECVID 2004. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  29. Hauptmann, A. G. (2006). Automatic spoken document retrieval. In K. Brown, (Ed.), Encyclopedia of language and linguistics 2nd ed. Amsterdam: Elsevier.Google Scholar
  30. Hauptmann, A. G., & Christel, M. G. (2004). Successful approaches in the TREC video retrieval evaluations. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 668–675). New York, NY, USA .Google Scholar
  31. Hauptmann, A. G., Baron, R., Chen, M. Y., Christel, M., Duygulu, P., Huang, C., Jin, R., Lin, W. H., Ng, T., Moraveji, N., Papernick, N., Snoek, C., Tzanetakis, G., Yang, J., Yan, R., & Wactlar, H. (2003a), Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  32. Hauptmann, A. G., Jin, R., & Ng, T. D. (2003b), Video retrieval using speech and image information. In Storage and Retrieval for Multimedia Databases 2003, Electronic Imaging ’03, Santa Clara, CA (pp. 148–159).Google Scholar
  33. Hauptmann, A. G., Christel, M., Concescu, R., Gao, J., Jin, Q., Lin, W. H., Pan, J. Y., Stevens, S. M., Yan, R., Yang, J., & Zhang, Y. (2005). CMU Informedia’s TRECVID 2005 Skirmishes. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  34. He, J., Li, M., Zhang, H. J., Tong, H., & Zhang, C. (2004), Manifold-ranking based image retrieval. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 9–16). New York, NY, USA Google Scholar
  35. He, X., Ma, W. Y., King, O., Li, M., & Zhang, H. (2002). Learning and inferring a semantic space from user’s relevance feedback for image retrieval. In Proceedings of the tenth ACM international conference on Multimedia (pp. 343–346). Juan-les-Pins, France. Google Scholar
  36. Hua, X. S., Chen, X. R., Wenyin, L., & Zhang, H. J. (2001). Automatic location of text in video frames. In Proceedings of the 2001 ACM workshops on Multimedia, Ottawa (pp. 24–27). Ontario, Canada Google Scholar
  37. Huang, J., Kumar, S., Mitra, M., Zhu, W., & Zabih, R. (1997). Image indexing using color correlograms. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition (pp. 762–768).Google Scholar
  38. Huang, X., Alleva, F., Hon, H. W., Hwang, M. Y., & Rosenfeld, R. (1993). The SPHINX-II speech recognition system: An overview. Computer Speech and Language, 7(2), 137–148.CrossRefGoogle Scholar
  39. Huurnink, B. (2005). AutoSeek: Towards a fully automated video search system. Master’s thesis. Netherlands: University of Amsterdam.Google Scholar
  40. Iyengar, G., Duygulu, P., Feng, S., Ircing, P., Khudanpur, S. P., Klakow, D., Krause, M. R., Manmatha, R., Nock, H. J., Petkova, D., Pytlik, B., & Virga, P. (2005). Joint visual-text modeling for automatic retrieval of multimedia documents. In Proceedings of ACM Intl. Conf. on Multimedia.Google Scholar
  41. Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual ACM SIGIR conference on informaion retrieval (pp. 119–126). Toronto, Canada. Google Scholar
  42. Jin, R., & Hauptmann, A. G. (2002). Using a probabilistic source model for comparing images. In Proceedings of IEEE Intl. Conf. on Image Processing (ICIP), Rochester, NY.Google Scholar
  43. Jing, Y., & Croft, W. B. (1994). An association thesaurus for information retrieval. In Proceedings of RIAO-94, 4th International Conference “Recherche d’Information Assistee par Ordinateur” (pp. 146–160). New York, US. Google Scholar
  44. Kennedy, L., Natsev, P., & Chang, S. F. (2005). Automatic discovery of query class dependent models for multimodal search. In Proceedings of ACM Intl. Conf. on Multimedia (pp. 882–891). Singapore.Google Scholar
  45. Kraaij, W. (2004). Variations on Language Modeling for Information Retrieval. PhD thesis. Netherlands: University of Twente.Google Scholar
  46. Lafferty, J., & Zhai, C. (2003). Probabilistic relevance models based on document and query generation. In Language Modeling for Information Retrieval, Kluwer International Series on Information Retrieval, vol 13. Springer.Google Scholar
  47. Lee, T. S. (1996). Image representation using 2D Gabor Wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(10), 959–971.CrossRefGoogle Scholar
  48. Lei, Z., Tasdizen, T., & Cooper, D. (1997). Object signature curve and invariant shape patches for geometric indexing into pictorial databases. In Proceedings of Multimedia Storage and Archiving Systems II (pp. 232–243). Dallas, TX.Google Scholar
  49. Leroy, A. M., & Rousseeuw, P. J. (1987). Robust regression and outlier detection Wiley Series in Probability and Mathematical Statistics. New York: Wiley.Google Scholar
  50. Lew, M. S., Sebe, N., & Eakins, J. P. (Eds.) (2002). In Proceedings of Intl. Conf. on Image and Video Retrieval. London, UK.Google Scholar
  51. Lew, M. S., Sebe, N., Djeraba, C., & Jain, R. (2006). Content-based multimedia information retrieval: State of the art and challenges. ACM Transactions Multimedia Computing, Communications and Applications, 2(1), 1–19.CrossRefGoogle Scholar
  52. Li, B., & Ma, S. (1994). On the relation between region and contour representation. In Proc. IEEE Intl. Conf. on Pattern Recognition (pp. 352–355).Google Scholar
  53. Li, J., Wang, J. Z., & Wiederhold, G. (2000). IRM: integrated region matching for image retrieval. In Proceedings of the eighth ACM international conference on Multimedia (pp. 147–156). Marina del Rey, California, United States.Google Scholar
  54. Lienhart, R. (2003). Video OCR: A survey and practitioner’s guide. In Video Mining. Kluwer Academic Publisher.Google Scholar
  55. Lin, C., Tseng, B., & Smith, J. (2003). VideoAnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning. In IEEE International Conference on Multimedia and Expo, Baltimore, MD.Google Scholar
  56. Lu, H., Ooi, B., & Tan, K. (1994). Efficient image retrieval by color contents. In Proceedings of the 1994 Intl. Conf. on Applications of Databases (pp. 95–108). Vadstena, Sweden.Google Scholar
  57. Ma, W. Y., & Manjunath, B. S. (1995). A comparison of wavelet transform features for texture image annotation. In Proceedings of the International Conference on Image Processing (Vol. 2), IEEE Computer Society (p. 2256). Washington, DC, USA.Google Scholar
  58. MacWorld (2006). Apple preps movie download service. http://www.macworld.co.uk/news/index.cfm?NewsID=13958.
  59. Manjunath, B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machchine Intelligence, 18(8), 837–842.CrossRefGoogle Scholar
  60. Marr, D., & Hildreth, E. (1979). Theory of edge detection. In Proceedings of Royal Society of London Bulletin (pp. 301–328).Google Scholar
  61. McDonald, K., & Smeaton, A. (2005) A comparison of score, rank and probability-based fusion methods for video shot retrieval. In International Conference on Image and Video Retrieval(CIVR), Dublin, Ireland (pp. 61–70).Google Scholar
  62. Mehtre, B. M., Kankanhalli, M. S., & Lee, W. F. (1997). Shape measures for content based image retrieval: a comparison. Inf Process Manage, 33(3), 319–337.CrossRefGoogle Scholar
  63. Mitiche, A., & Aggarwal, J. K. (1985). Image segmentation by conventional and information-integrating techniques: a synopsis. Image and Vision Computing, 3(2), 50–62.CrossRefGoogle Scholar
  64. Nagasaka, A., & Tanaka, Y. (1992). Automatic video indexing and full-video search for object appearances. In Proceedings of the IFIP TC2/WG 2.6 Second Working Conference on Visual Database Systems II (pp. 113–127). North-Holland.Google Scholar
  65. Naphade, M. R., & Smith, J. R. (2004). On the detection of semantic concepts at trecvid. In Proceedings of the 12th annual ACM international conference on Multimedia(pp. 660–667). New York, NY, USA .Google Scholar
  66. Naphade, M. R., Kristjansson, T., Frey, B., & Huang, T. (1998). Probabilistic multimedia objects (multijects): A novel approach to video indexing and retrieval in multimedia systems. In Proceedings of IEEE International Conference on Image Processing (ICIP) (pp. 536–540).Google Scholar
  67. Natsev, A., & Smith, J. R. (2003). Active selection for multi-example querying by content. In IEEE International Conference on Multimedia and Expo (ICME), Baltimore, MA.Google Scholar
  68. Neo, S. Y., Zhao, J., Kan, M. Y., & Chua, T. S. (2006). Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In Proceedings of the Conference on Image and Video Retrieval (CIVR) (pp. 370–379). Singapore Google Scholar
  69. Nevatia, R. (1986) Image segmentation. In T. Y. Young, & K.S. Fu (Eds.), Handbook of pattern recognition and image processing. San Diego, CA: Academic Press.Google Scholar
  70. Ngo, C. W., Pong, T. C., & Zhang, H. J. (2001). On clustering and retrieval of video shots. In Proceedings of the ninth ACM international conference on Multimedia (pp. 51–60). Ottawa, Canada.Google Scholar
  71. Ohanian, P. P., & Dubes, R. C. (1992). Performance evaluation for four classes of texture features. Pattern Recognition, 25(2), 819-833.CrossRefGoogle Scholar
  72. Pal, N. R., & Pal, S. K. (1993). A review on image segmentation techniques. Pattern Recognition, 26, 1277–1294.CrossRefGoogle Scholar
  73. Pass, G., & Zabih, R. (1999). Comparing images using joint histograms. Multimedia System, 7(3), 234–240.CrossRefGoogle Scholar
  74. Perona, P., & Freeman, W. (1998). A factorization approach to grouping. Lecture Notes in Computer Science, 1406, 655.CrossRefGoogle Scholar
  75. Platt, J. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. Smola, P. Bartlett, B. Scholkopf, & D. Schuurmans (Eds.), Advances in Large Margin Classiers. MIT Press.Google Scholar
  76. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM SIGIR conference on Research and development in information retrieval (pp. 275–281). Melbourne, Australia.Google Scholar
  77. Puzicha, J., Hofmann, T., & Buhmann, J. (1997). Non-parametric similarity measures for unsupervised texture segmentation and image retrieval. In Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition (pp. 267–272).Google Scholar
  78. Qiu, Y., & Frei, H. P. (1993). Concept based query expansion. In Proceedings of the 16th annual international ACM SIGIR conference (pp. 160–169). Pittsburgh, Pennsylvania, United States.Google Scholar
  79. Rautiainen, M., Hosio, M., Hanski, I., Varanka, M., Kortelainen, J., Ojala, T., & Seppanen, T. (2004a). TRECVID 2004 experiments at MediaTeam Oulu. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  80. Rautiainen, M., Ojala, T., & Seppanen, T. (2004b). Cluster-temporal browsing of large news video databases. In IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan.Google Scholar
  81. Rickman, R., & Stonham, J. (1996). Content-based image retrieval using color tuple histograms. In Storage and Retrieval for Image and Video Databases (SPIE) (pp. 2–7).Google Scholar
  82. Robertson, S. E. (1977). The probability ranking principle in IR. Journal of Documentation, 33(4), 294-304.Google Scholar
  83. Robertson, S. E., & Sparck Jones, K. (1977) Relevance weighting of search terms. Journal of the American Society for Informaiton Science, 27, 129–146.CrossRefGoogle Scholar
  84. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th ACM SIGIR (pp. 232–241). Dublin, Ireland .Google Scholar
  85. Robertson, S. E., Walker, S., Hancock-Beaulieu, M., Gull, A., & Lau, M. (1992). Okapi at TREC4. In Text REtrieval Conference, Gaithersburg, MD (pp. 21–30).Google Scholar
  86. Rocchio, J. J. (1971). Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ (pp. 313–323).Google Scholar
  87. Rowe, L. A., & Jain, R. (2004). ACM SIGMM retreat report on future directions in multimedia research. In Proceedings of ACM Multimedia.Google Scholar
  88. Rui, Y., She, A., & Huang, T. (1996). Modified Fourier descriptors for shape representation – A practical approach. In Proceedings of First International Workshop on Image Databases and Multimedia Search, Amsterdam, The Netherlands.Google Scholar
  89. Rui, Y., Huang, T., & Mehrotra, S. (1997a). Content-based image retrieval with relevance feedback in MARS. In Proc. IEEE Intl. Conf. on Image Processing (pp. 815–818).Google Scholar
  90. Rui, Y., Huang, T. S., & Chang, S. F. (1997b). Image retrieval: Past, present, and future. In International Symposium on Multimedia Information Processing.Google Scholar
  91. Salton, G. (1989). Automatic text processing. Addison-Wesley.Google Scholar
  92. Sarkar, S., & Boyer, K. L. (1996). Quantitative measures of change based on feature organization: Eigenvalues and eigenvectors. In IEEE Computer Vision and Pattern Recognition(CVPR) (pp. 478–483).Google Scholar
  93. Sato, T., Kanade, T., Hughes, E., & Smith, M. (1998). Video OCR for digital news archives. In IEEE Workshop on Content-Based Access of Image and Video Databases(CAIVD’98) (pp. 52 – 60).Google Scholar
  94. Schmid, C., & Mohr, R. (1997) Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.CrossRefGoogle Scholar
  95. Shaw, J. A., & Fox, E. A. (1994). Combination of multiple searches. In Text REtrieval Conference, Gaithersburg, MD.Google Scholar
  96. Shi, J., & Malik, J. (1998). Motion segmentation and tracking using normalized cuts. In Proceedings of Intl. Conf. on Computer Vision (ICCV) (pp. 1154–1160).Google Scholar
  97. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRefGoogle Scholar
  98. Singhal, A., Buckley, C., & Mitra, M. (1996). Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR, Zurich (pp. 21–29). Switzerland Google Scholar
  99. Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, p. 1470.Google Scholar
  100. Smeaton, A., & Over, P. (2003). TRECVID: Benchmarking the effectiveness of information retrieval tasks on digital video. In Proceedings of the Intl. Conf. on Image and Video Retrieval (pp. 19–27).Google Scholar
  101. Smeaton, A., Over, P., & Kraaij, W. (2006). Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM international workshop on Multimedia information retrieval (pp. 321–330).Google Scholar
  102. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval: the end of the early years. IEEE Transactions on Pattern Analysis Machine Intelligence, 12, 1349–1380.CrossRefGoogle Scholar
  103. Smith, J. R., & Chang, S. F. (1996a). Automated binary texture feature sets for image retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 2239–2242).Google Scholar
  104. Smith, J. R., & Chang, S. F. (1996b). Tools and techniques for color image retrieval. In Storage and Retrieval for Image and Video Databases (SPIE) (pp. 426–437).Google Scholar
  105. Smith, J. R., & Chang, S. F. (1996c). Visualseek: A fully automated content-based image query system. In ACM Multimedia (pp. 87–98).Google Scholar
  106. Smith, J. R., Lin, C. Y., Naphade, M. R., Natsev, P., & Tseng, B. (2002). Advanced methods for multimedia signal processing. In Proceedings of Intl. Workshop for Digital Communications, Capri, Italy.Google Scholar
  107. Snoek, C., & Worring, M. (2005). Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools Application, 25(1), 5–35.CrossRefGoogle Scholar
  108. Snoek, C., Worring, M., Geusebroek, J., Koelma, D., & Seinstra, F. (2004), The MediaMill TRECVID 2004 semantic viedo search engine. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  109. Snoek, C., Worring, M., & Smeulders, A. (2005). Early versus late fusion in semantic video analysis. In Proceedings of ACM Intl. Conf. on Multimedia(pp. 399–402). Singapore.Google Scholar
  110. Srikanth, M., Bowden, M., & Moldovan, D. (2005). LCC at trecvid 2005. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  111. Stricker, M. A. (1994). Bounds for the discrimination power of color indexing techniques. In Storage and Retrieval for Image and Video Databases (SPIE) (pp. 15–24).Google Scholar
  112. Stricker, M. A., & Orengo, M. (1995). Similarity of color images. In Storage and Retrieval for Image and Video Databases (SPIE) (pp. 381–392).Google Scholar
  113. Su, Z., Li, S., & Zhang, H. (2001). Extraction of feature subspaces for content-based retrieval using relevance feedback. In Proceedings of the ninth ACM international conference on Multimedia (pp. 98–106). Ottawa, Canada.Google Scholar
  114. Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal on Computer Vision, 7(1), 11–32.CrossRefGoogle Scholar
  115. Szummer, M., & Picard, R. (2002). Indoor-outdoor image classification. In IEEE International Workshop in Content-Based Access to Image and Video Databases, Bombay, India.Google Scholar
  116. Thyagarajan, K., Nguyen, J., & Persons, C. (1996). A maximum likelihood approach to texture classification using wavelet transform. In Proceedings of the International Conference on Image Processing (ICIP) (pp. 640–644).Google Scholar
  117. Turtle, H. R. (1991). Inference Networks for Document Retrieval. PhD thesis, University of Massachusetts.Google Scholar
  118. Tuytelaars, T., & van Gool, L. J. (1999). Content-based image retrieval based on local affinely invariant regions. In Proceedings of the Third International Conference on Visual Information and Information Systems(pp. 493–500). Springer-Verlag, London, UK.Google Scholar
  119. Vogt, C. C., & Cottrell, G. W. (1999). Fusion via a linear combination of scores. Information Retrieval, 1(3), 151–173.CrossRefGoogle Scholar
  120. Volkmer, T., & Natsev, A. (2006). Exploring automatic query refinement for text-based video retrieval. In IEEE International Conference on Multimedia and Expo (ICME), Toronto, ON (pp. 765–768).Google Scholar
  121. Wactlar, H., Christel, M., Gong, Y., & Hauptmann, A. G. (1999) Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2), 66–73.Google Scholar
  122. Westerveld, T. (2004). Using generative probabilistic models for multimedia retrieval. PhD thesis, CWI, Centre for Mathematics and Computer Science.Google Scholar
  123. Westerveld, T., & de Vries. A. (2004). Multimedia retrieval using multiple examples. In International Conference on Image and Video Retrieval (CIVR), Dublin, Ireland (pp. 344–352).Google Scholar
  124. Westerveld, T., Ianeva, T., Boldareva, L., de Vries, A. P., & Hiemstra, D. (2003). Combining infomation sources for video retrieval: The lowlands team at TRECVID 2003. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  125. White, R. W., Jose, J. M., & Ruthven, I. (2006). An implicit feedback approach for interactive information retrieval. Information Processing and Management, 42(1), 166–190.CrossRefGoogle Scholar
  126. Wu, Y., Chang, E. Y., Chang, K. C. C., & Smith, J. R. (2004). Optimal multimodal fusion for multimedia data analysis. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 572–579). New York, NY, USA.Google Scholar
  127. Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information System, 18(1), 79–112.CrossRefGoogle Scholar
  128. Yan, R. (2006). Probabilistic models for combining diverse knowledge sources in multimedia retrieval. PhD thesis. Pittsburgh, PA: School of Computer Science, Carnegie Mellon University.Google Scholar
  129. Yan, R., & Hauptmann, A. G. (2006). Probabilistic latent query analysis for combining multiple retrieval sources. In Proceedings of the 29th annual international ACM SIGIR conference on information retrieval (pp. 324–331). Seattle, Washington, USA.Google Scholar
  130. Yan, R., Yang, J., & Hauptmann, A. G. (2004). Learning query-class dependent weights in automatic video retrieval. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 548–555). New York, NY, USA.Google Scholar
  131. Yang, H., Chaisorn, L., Zhao, Y., Neo, S. Y., & Chua, T. S. (2003). VideoQA: Question answering on news video. In Proceedings of the 11th ACM Multimedia (pp. 632–641). Berkeley, CA, USA.Google Scholar
  132. Yang, J., Chen, M. Y., & Hauptmann, A. G. (2004) Finding person X: Correlating names with visual appearances. In Proceedings of the Intl. Conf. on Image and Video Retrieval (CIVR)(pp. 270–278). Dublin, Ireland.Google Scholar
  133. Yuan, J., Xiao, L., Wang, D., Ding, D., Zuo, Y., Tong, Z., Liu, X., Xu, S., Zheng, W., Li, X., Si, Z., Li, J., Lin, F., & Zhang, B. (2005). Tsinghua university at TRECVID 2005. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD.Google Scholar
  134. Zahn, C., & Roskies, R. (1972). Fourier descriptors for plane closed curve. IEEE Transactions on Computers, 21, 269–281.MATHMathSciNetCrossRefGoogle Scholar
  135. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th ACM SIGIR conference on Research and development in information retrieval (pp. 334–342). New Orleans, Louisiana, United States.Google Scholar
  136. Zhai, Y., Liu, J., Cao, X., Basharat, A., Hakeem, A., Ali, S., Shah, M., Grana, C., & Cucchiara, R. (2005). Video understanding and content-based retrieval. In Proceedings of NIST TREC Video Retrieval Evaluation 2005, Gaithersburg, MD.Google Scholar
  137. Zhang, H. J., Smoliar, S. W., Wu, J. H., & Low, C. Y. (1994). Development of a video database system. SIGOIS Bull, 15(1), 9.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Intelligent Information Management DepartmentIBM TJ Watson ResearchHawthorneUSA
  2. 2.Language Technologies InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations