Abstract
Video texts are closely related to the video content. The video text information can facilitate content based video analysis, indexing and retrieval. Video sequences are usually compressed before storage and transmission. A basic step of text-based applications is text detection and localization. In this paper, an overlaid text detection and localization method is proposed for H.264/AVC compressed videos by using the integer discrete cosine transform (DCT) coefficients of intra-frames. The main contributions of this paper are in the following two aspects: 1) coarse text blocks detection using block sizes and quantization parameters adaptive thresholds; 2) text line localization according to the characteristics of text in intra frames of H.264/AVC compressed domain. Comparisons are made with the pixel domain based text detection method for the H.264/AVC compressed video. Text detection results on five H.264/AVC video sequences under various qualities show the effectiveness of the proposed method.
Similar content being viewed by others
References
Chen D, Bourlard H, Thiran J (2001) Text identification in complex background using svm. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2, 621-626
Crandall D, Kasturi R (2001) Robust detection of stylized text events in digital video. In Proceedings of the International Conference on Document Analysis and Recognition 865-869
Cui Y, Huang Q (1997) Character extraction of license plates from video. In Proceedings of the Conference on Computer Vision and Pattern Recognition 502-507
Ekin A (2006) Local information based overlaid text detection by classifier fusion. In Proc. ICASSP2006, 2, II753-II756.
Gargi U, Antani S, Kasturi R (1998) Indexing text events in digital video databases. In Proc. Int. Conf. Pattern Recognit., 1, 916-918
Gordon S (2003) Simplified Use of 8x8 Transform. Doc. JVT-I022, San Diego, Sept. 2003
INRIA FTP site. ftp://imedia-ftp.inria.fr//MUSCLE-VCD-2007//DB-MPEG1//Movie23.mpg
Jain A, Yu B (1998) Automatic text location in images and video frames. In Proc. ICPR, 1497-1499
Jiang H, Liu G, Qian X, et al. (2008) A fast and efficient text tracking in compressed video. in Proc ISM
Jung K, Kim K, Jain A (2004) Text information extraction in images and video: a survey. Pattern Recognition 37:977–997
JVT Reference Software version 10.2. ftp://ftp.imtc-files.org/jvt-experts/reference_software/
JVT-G050, 2003. Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264/ISO/IEC 14486-10 AVC. in Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VECG
Lee C, Jung K, Kim H (2003) Automatic text detection and removal in video sequences. Pattern Recogn Lett 24:2607–2623
Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156
Lim Y, Choi S, Lee S (2000) Text extraction in MPEG compressed video for content-based indexing. In Proc. Int. Conf. on Pattern Recognit., 4, 409-412
Liu Z, Sarkar S (2008) Robust outdoor text detection using text intensity and shape features. in Proc ICPR
Lu S, Barner K (2008) Weighted DCT coefficients based text detection. in Proc. ICASSP 1341-1344
Lyu M, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits and Systems for Video Technology 15(2):243–255
Malvar H et al (2003) Low-complexity transform and quantization in H.264/AVC. IEEE Trans CSVT 13:598–603
Mariano V, Kasturi R (2000) Locating uniform-colored text in video frames. in Proc. 15th Int. Conf. Pattern Recognit., 4, 539-542
Ngo C, Chan C (2005) Video text detection and segmentation for optical character recognition. Multimedia Systems 10(3):261–272
Qi W, Gu L, Jiang H, Chen X, Zhang H (2000) Integrating visual, audio and text analysis for news video. in Proc. Int. Conf. Image Process., 3, 520-523
Qian X, Liu G (2006) Text detection, localization and segmentation in compressed videos. in Proc. ICASSP2006., 2, II385-II388
Qian X, Liu G (2007) Global motion estimation from randomly selected motion vector groups and GM/LM based applications. Signal, Image and Video Processing 4:179–189
Qian X, Liu G, Su R (2006) Effective fades and flashlight detection based on accumulating histogram difference. IEEE Trans Circuits and Systems for Video Technology 16(11):1245–1258
Qian X, Liu G, Wang H, Su R (2007) Text detection, localization and tracking in compressed videos. Signal Processing: Image Communication 22(9):752–768
Rainer L, Axel W (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits and Systems for Video Technology 12(4):256–267
Sato T, Kanade T (1998) Video OCR: Indexing digital news libraries by recognition of superimposed caption. ICCV Workshop on Image and Video retrieval
Shen B, Sethi I (1996) Direct feature extraction from compressed images. in IS&T SPIE: Storage and Retrieval for Image and Video Databases IV, 2607, 404-417
Shivakumara P, Phan TQ, Tan CL (2009) A robust wavelet transform based technique for video text detection. Int Conf Document Analysis and Recognition, 1285-1289
Snoek C, Worring M (2005) Multimedia event-based video indexing using time intervals. IEEE Trans Multimedia 7(4):638–647
Sun L, Liu G, Qian X, Guo D (2009) A novel text detection and localization method based on corner response. in Proc ICME
Tang X, Gao B, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Networks 13(4):961–971
Wang P, Cai R, Yang S (2003) A hybrid approach to news video classification with multimodal features. in Proc. Int. Conf. on Information, Communication and Signal Processing, 2, 787-791
Wang R, Jin W, Wu L (2004) A novel video caption detection approach using multi-frame integration. ICPR 2004. Proceedings of the 17th International Conference, 1, 449-52
Wang F, Ma Y, Zhang H, Li J (2005) A generic framework for semantic sports video analysis using dynamic bayesian networks. in Proc. Int. Conf. on Multimedia Modeling, 115-121
Wiegand T, Sullivan G, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Tans Circuits Syst Video Technol 13:560–576
Wu W, Chen D, Yang J (2005) Integrating co-training and recognition for text detection. In Proceedings of the International Conference on Multimedia Expo
Wu V, Manmatha R, Riseman E (1999) Textfinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell 21(11):1224–229
Zhang J, Goldgof D, Kasturi R (2008) A new edge-based text verification approach for video. in Proc. ICPR
Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based video retrieval and browsing. Pattern Recognit 30:643–658
Zhong Y, Zhang H, Jain A (2000) Automatic caption localization in compressed video. IEEE Trans Pattern Analysis and Machine Intelligence 22(4):385–392
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (NSFC) Project No.60903121, No.61173109, and Foundations of Microsoft Research Asia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qian, X., Wang, H. & Hou, X. Video text detection and localization in intra-frames of H.264/AVC compressed video. Multimed Tools Appl 70, 1487–1502 (2014). https://doi.org/10.1007/s11042-012-1168-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1168-z