Abstract
Speech signal, video caption text and video frame images are all key factors for a person to understand the video content. Through above observation, we bring forward a scheme which integrating continuous speech recognition, video caption text recognition and object recognition. The video is firstly segmented into a number of shots by shot detection. Then the caption text recognition and speech recognition are carried out and the results are treated as two paragraphs of text. Only the noun words are kept. The words are further depicted as a graph. The graph vertices stand for the words and the edges denote the semantic relation between two neighboring words. In the last step, we apply the dense sub graph finding method to mine the video semantic meaning. Experiments show that our video semantic mining method is efficient.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Trends Inf. Retriev. 4(2), 215 (2009)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. ECCV Int. Workshop Statistical Learning in Computer Vision, Prague, Czech Republic (2004)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. IEEE Int. Conf. Computer Vision (2003)
Smeaton, A., Over, P., Kraaij, W.: Evaluation campaigns and TRECVID. In: Proc. ACM SIGMM Int. Workshop Multimedia Information Retrieval (2006)
Winn, J.: The PASCAL Visual Object Classes Challeng 2010 (VOC 2010) Development Kit. Tech. Rep., University of Leeds (2010)
Zhang, X., Liu, Y., Liang, C., Xu, C.: A visualized communication system using cross-media semantic association. In: Proceedings of the 17th International Conference on Advances in Multimedia Modelling, Taipei, Taiwan, January 05-07 (2011)
Hanbury, A., Müller, H.: Automated component-level evaluation, present and future. In: Proceedings of the 2010 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-Language Evaluation Forum, Padua, Italy, September 20-23 (2010)
Lee, H., Yu, J., Im, Y., et al.: A unified scheme of shot boundary detection and anchor shot detection in news video story parsing. Multimedia Tools and Applications, 1127 (2011)
Open Source Toolkit For Speech Recognition, http://cmusphinx.sourceforge.net/
Rleon, M., Mallo, S., Gasull, A.: A tree structured-based caption text detection approach. In: Proceedings of the Fifth IASTED International Conference on Visualization, Imaging, and Image Processing, p. 220 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
Zhang, S., Li, H., Zhang, S. (2012). A Foreground Segmentation Scheme. In: Jin, D., Lin, S. (eds) Advances in Electronic Commerce, Web Application and Communication. Advances in Intelligent and Soft Computing, vol 148. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28655-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-28655-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28654-4
Online ISBN: 978-3-642-28655-1
eBook Packages: EngineeringEngineering (R0)