Text Mining in Multimedia

Zha, Zheng-Jun; Wang, Meng; Shen, Jialie; Chua, Tat-Seng

doi:10.1007/978-1-4614-3223-4_11

Zheng-Jun Zha³,
Meng Wang³,
Jialie Shen⁴ &
…
Tat-Seng Chua³

19k Accesses
4 Citations

Abstract

A large amount of multimedia data (e.g., image and video) is now available on the Web. A multimedia entity does not appear in isolation, but is accompanied by various forms of metadata, such as surrounding text, user tags, ratings, and comments etc. Mining these textual metadata has been found to be effective in facilitating multimedia information processing and management. A wealth of research efforts has been dedicated to text mining in multimedia. This chapter provides a comprehensive survey of recent research efforts. Specifically, the survey focuses on four aspects: (a) surrounding text mining; (b) tag mining; (c) joint text and visual content mining; and (d) cross text and visual content mining. Furthermore, open research issues are identified based on the current research efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altavista’s a/v photo finder. http://www.altavista.com/sites/search/simage.
Google Scholar
C. C. Aggarwal, H. Wang. Text Mining in Social Networks. Social Network Data Analytics, Springer, 2011.
Google Scholar
D. Cai, X. He, Z. Li, W.-Y. Ma, and J.-R. Wen. Hierarchical clustering of www image search results using visual, textual and link information. In Proceedings of the ACM Conference on Multimedia, 2004.
Google Scholar
S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky. Columbia university trecvid-2006 video search and high-level feature extraction. In Proceedings of NIST TRECVID workshop, 2006.
Google Scholar
L. Chen and A. Roy. Event detection from Flickr data through wavelet-based spatial analysis. In Proceedings of the ACM conference on Information and knowledge management, pages 523–532. ACM, 2009.
Google Scholar
L. Chen, D. Xu, I. W. Tsang, and J. Luo. Tag-based web photo retrieval improved by batch mode re-tagging. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
Google Scholar
W. Dai, Y. Chen, G.-R. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across difference feature spaces. In NIPS, pages 353–360, 2008.
Google Scholar
J. Fan, Y. Shen, N. Zhou, and Y. Gao. Harvesting large-scaleweaklytagged image databases from the web. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
Google Scholar
H. Feng, R. Shi, and T.-S. Chua. A bootstrapping framework for annotating and retrieving www images. In Proceedings of the ACM Conference on Multimedia, 2004.
Google Scholar
S. Feng, C. Lang, and D. Xu. Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking. In Proceedings of International Conference on Image and Video Retrieval, 2010.
Google Scholar
R. Fergus, P. Perona, and A. Zisserman. A visual category filter for google images. In Proceedings of the European Conference on Computer Vision, 2004.
Google Scholar
C. Frankel, M. J. Swain, and V. Athitsos. Webseer: An image search engine for the world wide web. Technical report, University of Chicago, Computer Science Department, 1996.
Google Scholar
B. Gao, T.-Y. Liu, Q. Tao, X. Zheng, Q. Cheng, and W.-Y. Ma. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the ACM Conference on Multimedia, 2005.
Google Scholar
B. Geng, L. Yang, C. Xu, and X.-S. Hua. Content-aware ranking for visual search. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2010.
Google Scholar
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.
Google Scholar
W. Hsu, L. Kennedy,, and S.-F. Chang. Reranking methods for visual search. IEEE Multimedia, 14:14–22, 2007.
Article Google Scholar
F. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:1877–1890, 2008.
Article Google Scholar
F. Jing, M. Li, H.-J. Zhang, and B. Zhang. A unified framework for image retrieval using keyword and visual features. IEEE Transactions on Image Processing, 2005.
Google Scholar
F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.-Y. Ma. Igroup: Web image search results clustering. In Proceedings of the ACM Conference on Multimedia, pages 377–384, 2006.
Google Scholar
L. S. Kennedy, S. F. Chang, and I. V. Kozintsev. To search or to label? predicting the performance of search-based automatic image classifiers. In Proceedings of the ACM International Workshop on Multimedia Information Retrieval, 2006.
Google Scholar
G. Li, M. Wang, Y. T. Zheng, Z.-J. Zha, H. Li, and T.-S. Chua. Shottagger: Tag location for internet videos. In Proceedings of the ACM International Conference on Multimedia Retrieval, 2011.
Google Scholar
X. Li, C. G. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. Pattern Recognition Letters, 11(7), 2009.
Google Scholar
X. Li, C. G. Snoek, and M. Worring. Unsupervised multi-feature tag relevance learning for social image retrieval. In Proceedings of the International Conference on Image and Video Retrieval, 2010.
Google Scholar
D. Liu, X. C. Hua, M. Wang, and H. Zhang. Image retagging. In Proceedings of the ACM Conference on Multimedia, 2010.
Google Scholar
D. Liu, X.-S. Hua, L. Yang, M.Wang, and H.-J. Zhang. Tag ranking. In Proceedings of the International Conference on World Wide Web, 2009.
Google Scholar
D. Liu, X.-S. Hua, and H.-J. Zhang. Content-based tag processing for internet social images. Multimedia Tools and Application, 51:723–738, 2010.
Article Google Scholar
D. Liu, S. Yan, Y. Rui, and H. J. Zhang. Unified tag analysis with multi-edge graph. In Proceedings of the ACM Conference on Multimedia, 2010.
Google Scholar
X. Liu, B. Cheng, S. Yan, J. Tang, T. C. Chua, and H. Jin. Label to region by bi-layer sparsify priors. In Proceedings of the ACM Conference on Multimedia, 2009.
Google Scholar
X. Liu, S. Yan, J. Luo, J. Tang, Z. Huang, and H. Jin. Nonparametric label-to-region by search. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
Google Scholar
Y. Liu, T. Mei, and X.-S. Hua. Crowdreranking: Exploring multiple search engines for visual search reranking. In Proceedings of the ACM SIGIR Conference, 2009.
Google Scholar
T. Mei, Z.-J. Zha, Y. Liu, M. Wang, and et al. Msra at trecvid 2008: High-level feature extraction and automatic search. In Proceedings of NIST TRECVID workshop, 2008.
Google Scholar
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 2010.
Google Scholar
G.-J. Qi, C. C. Aggarwal, and T. Huang. Towards semantic knowledge propagation from text corpus to web images. In Proceedings of the International Conference on World Wide Web, 2011.
Google Scholar
M. Rege, M. Dong, and J. Hua. Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In Proceedings of the International Conference on World Wide Web, 2008.
Google Scholar
F. Schroff, A. Criminisi, and A. Zisserman. Harvesting images databases from the web. In Proceedings of the International Conference on Computer Vision, 2007.
Google Scholar
D. A. Shamma, R. Shaw, P. L. Shafton, and Y. Liu. Watch what i watch: using community activeity to understand content. In Proceedings of the ACM Workshop on Multimedia Information Retrieval, 2007.
Google Scholar
X. Shi, Q. Liu, W. Fan, P. S. Yu, and R. Zhu. Transfer learning on heterogenous feature spaces via spectral tranformation. In Proceedings of the International Conference on Data Mining, 2010.
Google Scholar
B. Sigurbj¨ornsson and R. V. Zwol. Flickr tag recommendation based on collective knowledge. In Proceedings of International Conference on World Wide Web, 2008.
Google Scholar
J. Smith and S.-F. Chang. Visually searching the web for content. IEEE Multimedia, 4:12–20, 1995.
Article Google Scholar
R. Srihari. Automatic indexing and content-based retrieval of captioned images. IEEE Computer, 28:49–56, 1995.
Article Google Scholar
A. Sun and S. S. Bhowmick. Quantifying tag representativeness of visual content of social images. In Proceedings of the ACM Conference on Multimedia, 2010.
Google Scholar
X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. Bayesian video search reranking. In Proceedings of the ACM Conference on Multimedia, 2008.
Google Scholar
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. Identifying relevant frames in weakly labeled videos for training concept detectors. In Proceedings of the International Conference on Image and Video Retrieval, 2008.
Google Scholar
G. Wang and D. A. Forsyth. Object image retrieval by exploiting online knowledge resources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
Google Scholar
J.Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis through self tuning for web image search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
M. Wang, X. S. Hua, R. Hong, J. Tang, G. J. Qi, and Y. Song. Unified video annotation via multi-graph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19(5), 2009.
Google Scholar
M. Wang, X. S. Hua, J. Tang, and R. Hong. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11(3), 2009.
Google Scholar
M. Wang, B. Ni, X.-S. Hua, and T.-S. Chua. Assistive multimedia tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Computing Survey, 2011.
Google Scholar
X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. Multi-model similarity propagation and its application for web image retrieval. In Proceedings of the ACM Conference on Multimedia, pages 944–951, 2004.
Google Scholar
X.-J. Wang, W.-Y. Ma, L. Zhang, and X. Li. Iteratively clustering web images based on link and attribute reinforcements. In Proceedings of the ACM Conference on Multimedia, 2005.
Google Scholar
L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li. Flickr distance. In Proceedings of the ACM Conference on Multimedia, 2008.
Google Scholar
H. Xu, J.Wang, X.-S. Hua, and S. Li. Tag refinement by regularized LDA. In Proceedings of the ACM Conference on Multimedia, 2009.
Google Scholar
R. Yan and A. G. Hauptmann. Co-retrieval: A boosted reranking approach for video retrieval. In Proceedings of the ACM Conference on Image and Video Retrieval, 2004.
Google Scholar
R. Yan, A. G. Hauptmann, and R. Jin. Multimedia search with pseudo-relevance feedback. In Proceedings of the ACM Conference on Image and Video Retrieval, 2003.
Google Scholar
K. Yang, X.-S. Hua, M. Wang, and H. C. Zhang. Tagging tags. In Proceedings of the ACM Conference on Multimedia, 2010.
Google Scholar
Q. Yang, Y. Chen, G.-R. Xue, W. Dai, and Y. Yu. Heterogeneous transfer learning from image clustering via the social web. In Proceedings of the Joint Conference of the Annual Meeting of the ACL, 2009.
Google Scholar
Y.-H. Yang, P. Wu, C. W. Lee, K. H. Lin, W. Hsu, and H. H. Chen. Contextseer: Context search and recommendation at query time for shared consumer photos. In Proceedings of the ACM Conference on Multimedia, 2008.
Google Scholar
Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.
Google Scholar
Z.-J. Zha, T. Mei, J. Wang, X.-S. Hua, and Z. Wang. Graph-based semi-supervised learning with multiple labels. Journal of Visual Communication and Image Representation, 2009.
Google Scholar
Z.-J. Zha, M. Wang, Y.-T. Zheng, Y. Yang, R. Hong, and T.-S. Chua. Interactive video indexing with statistical active learning. IEEE Transactions on Multimedia, 2011.
Google Scholar
Z.-J. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. Viusal query suggestion. In Proceedings of the ACM Conference on Multimedia, 2009.
Google Scholar
R. Zhang, Z. M. Zhang, M. Li, W.-Y. Ma, and H.-J. Zhang. A probabilistic semantic model for image annotation and multi-modal image retrieval. In Proceedings of the International Conference on Computer Vision, pages 846–851, 2005.
Google Scholar
R. Zhao and W. I. Grosky. Narrowing the semantic gap - improved text-based web document retireval using visual fetures. IEEE Transactions on Multimedia, 4, 2002.
Google Scholar
G. Zhu, S. Yan, and Y. Ma. Image tag refinement towards lowrank, content-tag prior and error sparsity. In Proceedings of the ACM Conference on Multimedia, 2010.
Google Scholar
Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, 2011.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore, Singapore
Zheng-Jun Zha, Meng Wang & Tat-Seng Chua
Singapore Management University, Singapore, Singapore
Jialie Shen

Authors

Zheng-Jun Zha
View author publications
You can also search for this author in PubMed Google Scholar
Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jialie Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tat-Seng Chua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng-Jun Zha .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA
Charu C. Aggarwal
at Urbana-Champaign, University of Illinois, URBANA, 61801, Illinois, USA
ChengXiang Zhai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zha, ZJ., Wang, M., Shen, J., Chua, TS. (2012). Text Mining in Multimedia. In: Aggarwal, C., Zhai, C. (eds) Mining Text Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_11

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3223-4_11
Published: 07 January 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-3222-7
Online ISBN: 978-1-4614-3223-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics