Advertisement

Extracting Semantics from Multimedia Content: Challenges and Solutions

  • Lexing Xie
  • Rong Yan
Chapter
Part of the Signals and Communication Technology book series (SCT)

Abstract

Multimedia content accounts for over 60% of traffic in the current Internet [74]. With many users willing to spend their leisure time watching videos on YouTube or browsing photos through Flickr, sifting through large multimedia collections for useful information, especially those outside of the open Web, is still an open problem. The lack of effective indexes to describe the content of multimedia data is a main hurdle to multimedia search, and extracting semantics from multimedia content is the bottleneck for multimedia indexing. In this chapter, we present a review on extracting semantics from a large amount of multimedia data as a statistical learning problem. Our goal is to present the current challenges and solutions from a few different perspectives and cover a sample of related work. We start with an overview of a system that extracts and uses semantic components, and consist of five major components: data annotation, multimedia ontology, feature representation, model learning, and retrieval systems. We then present challenges for each of the five components along with their existing solutions: designing multimedia lexicons and using them for concept detection, handling multiple media sources and resolving correspondence across modalities, learning structured (generative) models to account for natural data dependency or model hidden topics, handling rare classes, leveraging unlabeled data, scaling to large amounts of training data, and finally leveraging media semantics in retrieval systems.

Keywords

Latent Dirichlet Allocation Multimedia Content Unlabeled Data Semantic Concept Video Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Reference

  1. 1.
    LSCOM lexicon definitions and annotations: Dto challenge workshop on large scale concept ontology for multimedia. http://www.ee.columbia.edu/dvmm/lscom/.
  2. 2.
  3. 3.
    Looking high and low: Who will be the google of video search? Wired, june 2007. http://www.wired.com/techbiz/media/magazine/15-07/st/_videomining.
  4. 4.
    A. Amir, W. Hsu, G. Iyengar, C.-Y. Lin, M. Naphade, A. Natsev, C. Neti, H. J. Nock, J. R. Smith, B. L. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, Nov 2003.Google Scholar
  5. 5.
    K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107–1135, 2002.Google Scholar
  6. 6.
    D. Blei and M. Jordan. Modeling annotated data. In Proc. of the 26th ACM Intl. Conf. on SIGIR, 2003.Google Scholar
  7. 7.
    D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993 – 1022, 2003.MATHGoogle Scholar
  8. 8.
    A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. of the Workshop on Computational Learning Theory, 1998.Google Scholar
  9. 9.
    M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex action recognition. In CVPR ’97: Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR ’97), page 994, Washington, DC, USA, 1997. IEEE Computer Society.Google Scholar
  10. 10.
    L. Breiman. Random forests. Mach. Learn., 45(1):5–32, 2001.MATHCrossRefGoogle Scholar
  11. 11.
    R. Brunelli, O. Mich, and C. M. Modena. A survey on the automatic indexing of video data. Journal of Visual Communication and Image Representation, 10(2):78–112, June 1999.CrossRefGoogle Scholar
  12. 12.
    C. Buckley and E. M. Voorhees. Evaluating evaluation measure stability. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 33–40, New York, NY, USA, 2000. ACM.Google Scholar
  13. 13.
    C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955–974, 1998.CrossRefGoogle Scholar
  14. 14.
    C. J. C. Burges and B. Schölkopf. Improving the accuracy and speed of support vector machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, p. 375. The MIT Press, 1997.Google Scholar
  15. 15.
    C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classifiers. In Proc. 17th International Conference on Machine Learning(ICML00), pp. 111–118, 2000.Google Scholar
  16. 16.
    M. Campbell, S. Ebadollahi, M. Naphade, A. P. Natsev, J. R. Smith, J. Tesic, L. Xie, K. Scheinberg, J. Seidl, A. Haubold, and D. Joshi. IBM research trecvid-2006 video retrieval system. In NIST TRECVID Workshop, Gaithersburg, MD, November 2006.Google Scholar
  17. 17.
    M. Campbell, A. Haubold, M. Liu, A. P. Natsev, J. R. Smith, J. Tesic, L. Xie, R. Yan, and J. Yang. IBM research trecvid-2007 video retrieval system. In NIST TRECVID Workshop, Gaithersburg, MD, November 2006.Google Scholar
  18. 18.
    L. Chaisorn, T.-S. Chua, and C.-H. Lee. A multi-modal approach to story segmentation for news video. World Wide Web, 6(2):187–208, 2003.CrossRefGoogle Scholar
  19. 19.
    P. Chan and S. Stolfo. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proc. Fourth Intl. Conf. Knowledge Discovery and Data Mining, pp. 164–168, 1998.Google Scholar
  20. 20.
    S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E. Zavesky, and D. Zhang. Columbia university TRECVID-2005 video search and high-level feature extraction. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2005.Google Scholar
  21. 21.
    S. F. Chang, T. Sikora, and A. Puri. Overview of the MPEG-7 standard. IEEE Trans. Circuits and Systems for Video Technology, 11(6):688–695, June 2001.CrossRefGoogle Scholar
  22. 22.
    M.-Y. Chen, M. Christel, A. Hauptmann, and H. Wactlar. Putting active learning into multimedia applications - dynamic definition and refinement of concept classifiers. In Proceedings of ACM Intl. Conf. on Multimedia, Singapore, November 2005.Google Scholar
  23. 23.
    M. Christel and A. G. Hauptmann. The use and utility of high-level semantic features. In Proc. of Intl. Conf. on Image and Video Retrieval (CIVR), Singapore, 2005.Google Scholar
  24. 24.
    M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 189–196, 1999.Google Scholar
  25. 25.
    M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proc. of EMNLP, 1999.Google Scholar
  26. 26.
    A. de Cheveigné and H. Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111:1917, 2002.CrossRefGoogle Scholar
  27. 27.
    S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 1990.Google Scholar
  28. 28.
    T. Downs, K. E. Gates, and A. Masters. Exact simplification of support vector solutions. Journal of Machine Learning Research, 2:293–297, 2002.MATHGoogle Scholar
  29. 29.
    L.-Y. Duan, M. Xu, T.-S. Chua, Q. Tian, and C.-S. Xu. A mid-level representation framework for semantic sports video analysis. In MULTIMEDIA ’03: Proceedings of the eleventh ACM international conference on Multimedia, pages 33–44, New York, NY, USA, 2003. ACM Press.Google Scholar
  30. 30.
    J. Duffy. Video drives net traffic. PC World, august 2007. http://www.pcworld.com/article/id,136069-pg,1/article.html.
  31. 311.
    S. Ebadollahi, L. Xie, S.-F. Chang, and J. R. Smith. Visual event detection using multi-dimensional concept dynamics. In Interational Conference on Multimedia and Expo (ICME), Toronto, Canada, July 2006.Google Scholar
  32. 32.
    C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google Scholar
  33. 33.
    B. Gold and N. Morgan. Speech and audio signal processing. Wiley New York, 2000.Google Scholar
  34. 34.
    griffin2007coc G. Griffin, A. Holub, and P. Perona. Caltech-256 Object Category Dataset. Technical report, Caltech, 2007.Google Scholar
  35. 35.
    A. Hauptmann, M.-Y. Chen, M. Christel, C. Huang, W.-H. Lin, T. Ng, N. Papernick, A. Velivelli, J. Yang, R. Yan, H. Yang, and H. D. Wactlar. Confounded Expectations: Informedia at TRECVID 2004. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2004.Google Scholar
  36. 36.
    A. G. Hauptmann, M. Christel, R. Concescu, J. Gao, Q. Jin, W.-H. Lin, J.-Y. Pan, S. M. Stevens, R. Yan, J. Yang, and Y. Zhang. CMU Informedia’s TRECVID 2005 Skirmishes. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2005.Google Scholar
  37. 37.
    A. Hesseldahl. Micron’s megapixel movement. BusinessWeek, 2006.Google Scholar
  38. 38.
    T. Hofmann. Probabilistic latent semantic indexing. In Proc. of the 22nd Intl. ACM SIGIR conference, pp. 50–57, Berkeley, California, United States, 1999.Google Scholar
  39. 39.
    B. Horn. Robot Vision. McGraw-Hill College, 1986.Google Scholar
  40. 40.
    C. Hsu, C. Chang, C. Lin, et al. A practical guide to support vector classification. National Taiwan University, Tech. Rep., July, 2003.Google Scholar
  41. 41.
    J. Huang, S. Ravi Kumar, M. Mitra, W. Zhu, and R. Zabih. Spatial Color Indexing and Applications. International Journal of Computer Vision, 35(3):245–268, 1999.Google Scholar
  42. 42.
    G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. P. Khudanpur, D. Klakow, M. R. Krause, R. Manmatha, H. J. Nock, D. Petkova, B. Pytlik, and P. Virga. Joint visual-text modeling for automatic retrieval of multimedia documents. In Proceedings of ACM Intl. Conf. on Multimedia, November 2005.Google Scholar
  43. 43.
    N. Japkowicz. Learning from imbalanced data sets: a comparison of various strategies. In AAAI Workshop on Learning from Imbalanced Data Sets. Tech Rep. WS-00-05, Menlo Park, CA: AAAI Press, 2000.Google Scholar
  44. 44.
    J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th annual ACM SIGIR conference on informaion retrieval, pp. 119–126, Toronto, Canada, 2003.Google Scholar
  45. 45.
    R. Jin, J. Y. Chai, and S. Luo. Automatic image annotation via coherent language model and active learning. In Proceedings of ACM Intl. Conf. on Multimedia, November 2004.Google Scholar
  46. 46.
    T. Joachims. Making large-scale svm learning practical. In Advances in Kernel Methods - Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), Springer, 1995.Google Scholar
  47. 47.
    T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.Google Scholar
  48. 48.
    T. Joachims. Training linear svms in linear time. In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 217–226, New York, NY, USA, 2006. ACM.Google Scholar
  49. 49.
    M. Joshi, R. Agarwal, and V. Kumar. Predicting rare classes: Can boosting make any weak learner strong? In the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.Google Scholar
  50. 50.
    D. Jurafsky and J. Martin. Speech and language processing. Prentice Hall, Upper Saddle River, NJ, 2000.Google Scholar
  51. 51.
    J. Kender. A large scale concept ontology for news stories: Empirical methods, analysis, and improvements. In IEEE International Conference on Multimedia and Expo (ICME), Beijing, China, 2007.Google Scholar
  52. 52.
    A. Levin, P. Viola, and Y. Freund. Unsupervised improvement of visual detectors using cotraining. In Proc. of the Intl. Conf. on Computer Vision, 2003.Google Scholar
  53. 53.
    C. Lin, B. Tseng, and J. Smith. Video AnnEx: IBM MPEG-7 annotation tool for multimedia indexing and concept learning. In IEEE International Conference on Multimedia and Expo, Baltimore, MD, 2003.Google Scholar
  54. 54.
    K. Lin and C. Lin. A study on reduced support vector machines. IEEE Transactions on Neural Networks, 14(6), 2003.Google Scholar
  55. 55.
    D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004.CrossRefGoogle Scholar
  56. 56.
    J. Markel. The SIFT algorithm for fundamental frequency estimation. Audio and Electroacoustics, IEEE Transactions on, 20(5):367–377, 1972.CrossRefGoogle Scholar
  57. 57.
    I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang. Automatic analysis of multimodal group actions in meetings. IEEE Trans. Pattern Anal. Mach. Intell., 27(3):305–317, 2005.CrossRefGoogle Scholar
  58. 58.
    I. Muslea, S. Minton, and C. A. Knoblock. Active semi-supervised learning = robust multi-view learning. In Proc. of Intl. Conf. on Machine Learning, 2002.Google Scholar
  59. 59.
    M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, and A. Paepcke. Context data in geo-referenced digital photo collections. In MULTIMEDIA ’04: Proceedings of the 12th annual ACM international conference on Multimedia, pp. 196–203, New York, NY, USA, 2004. ACM Press.Google Scholar
  60. 60.
    M. Naphade, L. Kennedy, J. Kender, S. Chang, J. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. Technical report, IBM Research, 2005.Google Scholar
  61. 61.
    M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-scale concept ontology for multimedia. IEEE MultiMedia, 13(3):86–91, 2006.CrossRefGoogle Scholar
  62. 62.
    M. R. Naphade, T. Kristjansson, B. Frey, and T. Huang. Probabilistic multimedia objects (multijects): A novel approach to video indexing and retrieval in multimedia systems. In Proc. of IEEE International Conference on Image Processing (ICIP), pp. 536–540, 1998.Google Scholar
  63. 63.
    M. R. Naphade and J. R. Smith. Active learning for simultaneous annotation of multiple binary semantic concepts. In Proceedings of IEEE International Conference On Multimedia and Expo (ICME), pp. 77–80, Taipei, Taiwan, 2004.Google Scholar
  64. 64.
    S.-Y. Neo, J. Zhao, M.-Y. Kan, and T.-S. Chua. Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In Proceedings of the Conference on Image and Video Retrieval (CIVR), pp. 370–379, Singapore, 2006.Google Scholar
  65. 65.
    K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proc. of CIKM, pp. 86–93, 2000.Google Scholar
  66. 66.
    N. M. Oliver, B. Rosario, and A. Pentland. A bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 2000.Google Scholar
  67. 67.
    J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Gcap: Graph-based automatic image captioning. In Proc. of the 4th International Workshop on Multimedia Data and Document Engineering (MDDE 04), in conjunction with Computer Vision Pattern Recognition Conference (CVPR 04), 2004.Google Scholar
  68. 68.
    D. Pierce and C. Cardie. Limitations of co-training for natural language learning from large datasets. In Proc. of EMNLP, 2001.Google Scholar
  69. 69.
    J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, editors, Advances in Large Margin Classiers. MIT Press, 1999.Google Scholar
  70. 70.
    F. Provost. Machine learning from imbalanced data sets 101/1. In AAAI Workshop on Learning from Imbalanced Data Sets. Tech Rep. WS-00-05, Menlo Park, CA: AAAI Press, 2000.Google Scholar
  71. 71.
    B. Pytlik, A. Ghoshal, D. Karakos, and S. Khudanpur. Trecvid 2005 experiment at johns hopkins university: Using hidden markov models for video retrieval. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2005.Google Scholar
  72. 72.
    L. Rabiner and B. Juang. Fundamentals of speech recognition. Prentice-Hall, Inc. Upper Saddle River, NJ, USA, 1993.Google Scholar
  73. 73.
    L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, feb 1989.Google Scholar
  74. 74.
    W. Roush. Tr10: Peering into video’s future. Technology Review, march 2007. http://www.technologyreview.com/Infotech/18284/?a=f.
  75. 75.
    H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(1):23–38, 1998.CrossRefGoogle Scholar
  76. 76.
    Y. Rui, T. Huang, and S. Chang. Image retrieval: current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10(4):39–62, 1999.CrossRefGoogle Scholar
  77. 77.
    E. Scheirer and M. Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. In ICASSP ’97: Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’97)-Volume 2, pp. 1331, Washington, DC, USA, 1997. IEEE Computer Society.Google Scholar
  78. 78.
    J. Schlenzig, E. Hunter, and R. Jain. Recursive identification of gesture inputs using hidden Markov models. In Proceedings of the Second IEEE Workshop on Applications of Computer Vision, pp. 187–194. IEEE Computer Society Press, 1994.Google Scholar
  79. 79.
    B. Shevade, H. Sundaram, and L. Xie. Modeling personal and social network context for event annotation in images. Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, 2007.Google Scholar
  80. 80.
    A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval: the end of the early years. IEEE Transactions on Pattern Analysis Machine Intelligence, 12:1349 – 1380, 2000.Google Scholar
  81. 81.
    C. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring. Adding semantics to detectors for video retrieval. IEEE Trans. Multimedia, 2007.Google Scholar
  82. 82.
    C. Snoek, M. Worring, J. Geusebroek, D. Koelma, and F. Seinstra. The MediaMill TRECVID 2004 semantic viedo search engine. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2004.Google Scholar
  83. 83.
    C. Snoek, M. Worring, and A. Smeulders. Early versus late fusion in semantic video analysis. In Proceedings of ACM Intl. Conf. on Multimedia, pp. 399–402, Singapore, November 2005.Google Scholar
  84. 84.
    C. G. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 25(1):5–35, 2005.Google Scholar
  85. 85.
    M. Srikanth, M. Bowden, and D. Moldovan. LCC at trecvid 2005. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2005.Google Scholar
  86. 86.
    T. Starner, A. Pentland, and J. Weaver. Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell., 20(12):1371–1375, 1998.CrossRefGoogle Scholar
  87. 87.
    M. Swain and D. Ballard. Color indexing. International Journal of Computer Vision, 7(1):11–32, 1991.CrossRefGoogle Scholar
  88. 88.
    D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 28(7):1088–1099, 2006.CrossRefGoogle Scholar
  89. 89.
    C. Teh and R. Chin. On image analysis by the methods of moments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(4):496–513, 1988.MATHCrossRefGoogle Scholar
  90. 90.
    The National Institute of Standards and Technology (NIST). TREC video retrieval evaluation, 2001–2007. http://www-nlpir.nist.gov/projects/trecvid/.
  91. 91.
    The National Institute of Standards and Technology (NIST). Common evaluation measures, 2002. http://trec.nist.gov/pubs/trec11/appendices/MEASURES.pdf.
  92. 92.
    S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proceedings of ACM Intl. Conf. on Multimedia, pp. 107–118, 2001.Google Scholar
  93. 93.
    J. van Gemert. Retrieving images as text, 2003. Master Thesis, University of Amsterdam.Google Scholar
  94. 94.
    V. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.Google Scholar
  95. 95.
    P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. Proc. CVPR, 1:511–518, 2001.Google Scholar
  96. 96.
    T. Volkmer and A. Natsev. Exploring automatic query refinement for text-based video retrieval. In IEEE International Conference on Multimedia and Expo (ICME), pp. 765–768, Toronto, ON, 2006.Google Scholar
  97. 97.
    G. Weiss and F. Provost. The effect of class distribution on classifier learning. Technical report, Department of Computer Science, Rutgers University, 2001.Google Scholar
  98. 98.
    T. Westerveld. Using generative probabilistic models for multimedia retrieval. PhD thesis, CWI, Centre for Mathematics and Computer Science, 2004.Google Scholar
  99. 99.
    Y. Wu, E. Y. Chang, K. C.-C. Chang, and J. R. Smith. Optimal multimodal fusion for multimedia data analysis. In Proceedings of the 12th annual ACM international conference on Multimedia, pp. 572–579, New York, NY, USA, 2004.Google Scholar
  100. 100.
    Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 2004.Google Scholar
  101. 101.
    L. Xie and S.-F. Chang. Pattern mining in visual concept streams. In Interational Conference on Multimedia and Expo (ICME), Toronto, Canada, July 2006.Google Scholar
  102. 102.
    L. Xie, S.-F. Chang, A. Divakaran, and H. Sun. Structure analysis of soccer video with hidden Markov models. In Proc. Interational Conference on Acoustic, Speech and Signal Processing (ICASSP), Orlando, FL, 2002.Google Scholar
  103. 103.
    L. Xie, S.-F. Chang, A. Divakaran, and H. Sun. Unsupervised Mining of Statistical Temporal Structures in Video, chapter 10. Kluwer Academic Publishers, 2003.Google Scholar
  104. 104.
    L. Xie, D. Xu, S. Ebadollahi, K. Scheinberg, S.-F. Chang, and J. R. Smith. Pattern mining in visual concept streams. In Proc. 40th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, Oct 2006.Google Scholar
  105. 105.
    E. P. Xing, R. Yan, and A. G. Hauptmann. Mining associated text and images using dual-wing harmoniums. In Uncertainty in Artifical Intelligence (UAI)’05, 2005.Google Scholar
  106. 106.
    D. Xu and S.-F. Chang. Visual event recognition in news video using kernel methods with multi-level temporal alignment. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, June 2007.Google Scholar
  107. 107.
    C. Xu et. al. Sports Video Analysis: from Semantics to Tactics. Springer, 2008.Google Scholar
  108. 108.
    R. Yan. Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 2006.Google Scholar
  109. 109.
    R. Yan and A. G. Hauptmann. Multi-class active learning for video semantic feature extraction. In Proceedings of IEEE International Conference On Multimedia and Expo (ICME), pages 69–72, Taipei, Taiwan, 2004.Google Scholar
  110. 110.
    R. Yan and A. G. Hauptmann. A review of text and image retrieval approaches for broadcast news video. Inf. Retr., 10(4-5):445–484, 2007.Google Scholar
  111. 111.
    R. Yan, Y. Liu, R. Jin, and A. Hauptmann. On predicting rare class with SVM ensemble in scene classification. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’03), 2003.Google Scholar
  112. 112.
    R. Yan and M. R. Naphade. Co-training non-robust classifiers for video semantic concept detection. In Proc. of IEEE Intl. Conf. on Image Processing(ICIP), 2005.Google Scholar
  113. 113.
    R. Yan and M. R. Naphade. Semi-supervised cross feature learning for semantic concept detection in video. In IEEE Computer Vision and Pattern Recognition(CVPR), San Diego, US, 2005.Google Scholar
  114. 114.
    R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 834–843, New York, NY, USA, 2007. ACM.Google Scholar
  115. 115.
    R. Yan, J. Yang, and A. G. Hauptmann. Learning query-class dependent weights in automatic video retrieval. In Proceedings of the 12th annual ACM international conference on Multimedia, pp. 548–555, New York, NY, USA, 2004.Google Scholar
  116. 116.
    R. Yan, M. yu Chen, and A. G. Hauptmann. Mining relationship between video concepts using probabilistic graphical model. In IEEE International Conference on Multimedia and Expo (ICME), Toronto, Canada, 2006.Google Scholar
  117. 117.
    J. Yang, M. Y. Chen, and A. G. Hauptmann. Finding person X: Correlating names with visual appearances. In Proc. of the Intl. Conf. on Image and Video Retrieval (CIVR), pp. 270–278, Dublin, Ireland, 2004.Google Scholar
  118. 118.
    A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. ACM Comput. Surv., 38(4):13, 2006.Google Scholar
  119. 119.
    Y. Zhai, X. Chao, Y. Zhang, O. Javed, A. Yilmaza, F. Rafi, S. Ali, O. Alatas, S. Khan, and M. Shah. University of Central Florida at TRECVID 2004. In Proceedings of NIST TREC Video Retrieval Evaluation, Gaithersburg, MD, 2004.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.IBM T J Waston Research CenterHawthorneUSA

Personalised recommendations