Abstract
A large amount of multimedia data (e.g., image and video) is now available on the Web. A multimedia entity does not appear in isolation, but is accompanied by various forms of metadata, such as surrounding text, user tags, ratings, and comments etc. Mining these textual metadata has been found to be effective in facilitating multimedia information processing and management. A wealth of research efforts has been dedicated to text mining in multimedia. This chapter provides a comprehensive survey of recent research efforts. Specifically, the survey focuses on four aspects: (a) surrounding text mining; (b) tag mining; (c) joint text and visual content mining; and (d) cross text and visual content mining. Furthermore, open research issues are identified based on the current research efforts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altavista’s a/v photo finder. http://www.altavista.com/sites/search/simage.
C. C. Aggarwal, H. Wang. Text Mining in Social Networks. Social Network Data Analytics, Springer, 2011.
D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen. Hierarchical clustering of www image search results using visual, textual and link information. In Proceedings of the ACM Conference on Multimedia, 2004.
S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky. Columbia university trecvid-2006 video search and high-level feature extraction. In Proceedings of NIST TRECVID workshop, 2006.
L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceedings of the ACM conference on Information and knowledge management, pages 523–532. ACM, 2009.
L. Chen, D. Xu, I. W. Tsang, and J. Luo. Tag-based web photo retrieval improved by batch mode re-tagging. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
W. Dai, Y. Chen, G.-R. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across difference feature spaces. In NIPS, pages 353–360, 2008.
J. Fan, Y. Shen, N. Zhou, and Y. Gao. Harvesting large-scaleweaklytagged image databases from the web. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
H. Feng, R. Shi, and T.-S. Chua. A bootstrapping framework for annotating and retrieving www images. In Proceedings of the ACM Conference on Multimedia, 2004.
S. Feng, C. Lang, and D. Xu. Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking. In Proceedings of International Conference on Image and Video Retrieval, 2010.
R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. In Proceedings of the European Conference on Computer Vision, 2004.
C. Frankel, M. J. Swain, and V. Athitsos. Webseer: An image search engine for the world wide web. Technical report, University of Chicago, Computer Science Department, 1996.
B. Gao, T.-Y. Liu, Q. Tao, X. Zheng, Q. Cheng, and W.-Y. Ma. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the ACM Conference on Multimedia, 2005.
B. Geng, L. Yang, C. Xu, and X.-S. Hua. Content-aware ranking for visual search. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2010.
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.
W. Hsu, L. Kennedy,, and S.-F. Chang. Reranking methods for visual search. IEEE Multimedia, 14:14–22, 2007.
F. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:1877–1890, 2008.
F. Jing, M. Li, H.-J. Zhang, and B. Zhang. A unified framework for image retrieval using keyword and visual features. IEEE Transactions on Image Processing, 2005.
F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.-Y. Ma. Igroup: Web image search results clustering. In Proceedings of the ACM Conference on Multimedia, pages 377–384, 2006.
L. S. Kennedy, S. F. Chang, and I. V. Kozintsev. To search or to label? predicting the performance of search-based automatic image classifiers. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2006.
G. Li, M. Wang, Y. T. Zheng, Z.-J. Zha, H. Li, and T.-S. Chua. Shottagger: Tag location for internet videos. In Proceedings of the ACM International Conference on Multimedia Retrieval, 2011.
X. Li, C. G. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. Pattern Recognition Letters, 11(7), 2009.
X. Li, C. G. Snoek, and M. Worring. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proceedings of the International Conference on Image and Video Retrieval, 2010.
D. Liu, X. C. Hua, M. Wang, and H. Zhang. Image retagging. In Proceedings of the ACM Conference on Multimedia, 2010.
D. Liu, X.-S. Hua, L. Yang, M.Wang, and H.-J. Zhang. Tag ranking. In Proceedings of the International Conference on World Wide Web, 2009.
D. Liu, X.-S. Hua, and H.-J. Zhang. Content-based tag processing for internet social images. Multimedia Tools and Application, 51:723–738, 2010.
D. Liu, S. Yan, Y. Rui, and H. J. Zhang. Unified tag analysis with multi-edge graph. In Proceedings of the ACM Conference on Multimedia, 2010.
X. Liu, B. Cheng, S. Yan, J. Tang, T. C. Chua, and H. Jin. Label to region by bi-layer sparsify priors. In Proceedings of the ACM Conference on Multimedia, 2009.
X. Liu, S. Yan, J. Luo, J. Tang, Z. Huang, and H. Jin. Nonparametric label-to-region by search. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
Y. Liu, T. Mei, and X.-S. Hua. Crowdreranking: Exploring multiple search engines for visual search reranking. In Proceedings of the ACM SIGIR Conference, 2009.
T. Mei, Z.-J. Zha, Y. Liu, M. Wang, and et al. Msra at trecvid 2008: High-level feature extraction and automatic search. In Proceedings of NIST TRECVID workshop, 2008.
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 2010.
G.-J. Qi, C. C. Aggarwal, and T. Huang. Towards semantic knowledge propagation from text corpus to web images. In Proceedings of the International Conference on World Wide Web, 2011.
M. Rege, M. Dong, and J. Hua. Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In Proceedings of the International Conference on World Wide Web, 2008.
F. Schroff, A. Criminisi, and A. Zisserman. Harvesting images databases from the web. In Proceedings of the International Conference on Computer Vision, 2007.
D. A. Shamma, R. Shaw, P. L. Shafton, and Y. Liu. Watch what i watch: using community activeity to understand content. In Proceedings of the ACM Workshop on Multimedia Information Retrieval, 2007.
X. Shi, Q. Liu, W. Fan, P. S. Yu, and R. Zhu. Transfer learning on heterogenous feature spaces via spectral tranformation. In Proceedings of the International Conference on Data Mining, 2010.
B. Sigurbj¨ornsson and R. V. Zwol. Flickr tag recommendation based on collective knowledge. In Proceedings of International Conference on World Wide Web, 2008.
J. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4:12–20, 1995.
R. Srihari. Automatic indexing and content-based retrieval of captioned images. IEEE Computer, 28:49–56, 1995.
A. Sun and S. S. Bhowmick. Quantifying tag representativeness of visual content of social images. In Proceedings of the ACM Conference on Multimedia, 2010.
X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. Bayesian video search reranking. In Proceedings of the ACM Conference on Multimedia, 2008.
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. Identifying relevant frames in weakly labeled videos for training concept detectors. In Proceedings of the International Conference on Image and Video Retrieval, 2008.
G. Wang and D. A. Forsyth. Object image retrieval by exploiting online knowledge resources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
J.Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis through self tuning for web image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
M. Wang, X. S. Hua, R. Hong, J. Tang, G. J. Qi, and Y. Song. Unified video annotation via multi-graph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19(5), 2009.
M. Wang, X. S. Hua, J. Tang, and R. Hong. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11(3), 2009.
M. Wang, B. Ni, X.-S. Hua, and T.-S. Chua. Assistive multimedia tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Computing Survey, 2011.
X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. Multi-model similarity propagation and its application for web image retrieval. In Proceedings of the ACM Conference on Multimedia, pages 944–951, 2004.
X.-J. Wang, W.-Y. Ma, L. Zhang, and X. Li. Iteratively clustering web images based on link and attribute reinforcements. In Proceedings of the ACM Conference on Multimedia, 2005.
L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li. Flickr distance. In Proceedings of the ACM Conference on Multimedia, 2008.
H. Xu, J.Wang, X.-S. Hua, and S. Li. Tag refinement by regularized LDA. In Proceedings of the ACM Conference on Multimedia, 2009.
R. Yan and A. G. Hauptmann. Co-retrieval: A boosted reranking approach for video retrieval. In Proceedings of the ACM Conference on Image and Video Retrieval, 2004.
R. Yan, A. G. Hauptmann, and R. Jin. Multimedia search with pseudo-relevance feedback. In Proceedings of the ACM Conference on Image and Video Retrieval, 2003.
K. Yang, X.-S. Hua, M. Wang, and H. C. Zhang. Tagging tags. In Proceedings of the ACM Conference on Multimedia, 2010.
Q. Yang, Y. Chen, G.-R. Xue, W. Dai, and Y. Yu. Heterogeneous transfer learning from image clustering via the social web. In Proceedings of the Joint Conference of the Annual Meeting of the ACL, 2009.
Y.-H. Yang, P. Wu, C. W. Lee, K. H. Lin, W. Hsu, and H. H. Chen. Contextseer: Context search and recommendation at query time for shared consumer photos. In Proceedings of the ACM Conference on Multimedia, 2008.
Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
Z.-J. Zha, T. Mei, J. Wang, X.-S. Hua, and Z. Wang. Graph-based semi-supervised learning with multiple labels. Journal of Visual Communication and Image Representation, 2009.
Z.-J. Zha, M. Wang, Y.-T. Zheng, Y. Yang, R. Hong, and T.-S. Chua. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia, 2011.
Z.-J. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. Viusal query suggestion. In Proceedings of the ACM Conference on Multimedia, 2009.
R. Zhang, Z. M. Zhang, M. Li, W.-Y. Ma, and H.-J. Zhang. A probabilistic semantic model for image annotation and multi-modal image retrieval. In Proceedings of the International Conference on Computer Vision, pages 846–851, 2005.
R. Zhao and W. I. Grosky. Narrowing the semantic gap - improved text-based web document retireval using visual fetures. IEEE Transactions on Multimedia, 4, 2002.
G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards lowrank, content-tag prior and error sparsity. In Proceedings of the ACM Conference on Multimedia, 2010.
Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 2011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zha, ZJ., Wang, M., Shen, J., Chua, TS. (2012). Text Mining in Multimedia. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_11
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3223-4_11
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)