Advertisement

Video Caption Detection

  • Tong Lu
  • Shivakumara Palaiahnakote
  • Chew Lim Tan
  • Wenyin Liu
Chapter
  • 1.8k Downloads
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Video contains two types of texts. The first type pertains to caption texts which are edited texts or graphics texts artificially superimposed into video and are relevant to the content of the video. The second type belongs to scene texts, which are naturally existing texts, usually embedded in objects in the video. This chapter focuses on the state-of-the-art methods developed for caption text detection in video. According to the literature, current methods can be classified into two broad categories, namely, feature-based methods and machine learning-based methods. Feature-based methods described in this chapter make use of the following features for text detection, namely, image edges by means of gradient and filters, textures by combining a variety of image textures, connected components by analyzing skeletons obtained from the image, and frequency domain features by performing Fourier transform. On the other hand, machine learning methods presented in this chapter make use of classifiers such as support vector machines, neural networks, and Bayesian classifiers.

Keywords

Text Line Bayesian Classifier Text Region Text Block Text Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Dimitrova N, Agnihotri L, Dorai C, Bolle R (2000) MPEG-7 video text description scheme for superimposed text in images and video. Signal Process Image Commun 16:137–155CrossRefGoogle Scholar
  2. 2.
    Jung K, Kim KI, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recognit 37:977–997CrossRefGoogle Scholar
  3. 3.
    Chen D, Luttin J, Shearer K (2000) A survey of text detection and recognition in images and videos, IDIAP research report, pp 1–21Google Scholar
  4. 4.
    Zhang J, Kasturi R (2008) Extraction of text objects in video documents: recent progress. In: Proceedings of the eighth IAPR workshop on document analysis systems (DAS), pp 5–17Google Scholar
  5. 5.
    Doermann D, Liang J, Li H (2003) Progress in camera-based document image analysis. In: Proceedings of the seventh international conference on document analysis and recognition (ICDAR)Google Scholar
  6. 6.
    Smith MA, Kanade T (1995) Video skimming for quick browsing based on audio and image characterization, Technical report CMU-CS-95-186. Mellon University, PittsburghGoogle Scholar
  7. 7.
    Chen D, Shearer K, Bourlard H (2001) Text enhancement with asymmetric filter for video OCR. In: Proceedings of the international conference on image analysis and processing, pp 192–197Google Scholar
  8. 8.
    Shivakumara P, Huang W, Tan CL (2008) An efficient edge based technique for text detection in video frames. In: Proceedings of the international workshop on document analysis systems (DAS 2008), pp 307–314Google Scholar
  9. 9.
    Shivakumara P, Huang W, Tan CL (2008) Efficient video text detection using edge features. In: Proceedings of the international conference on pattern recognition (ICPR08)Google Scholar
  10. 10.
    Shivakumara P, Phan TQ, Tan CL (2009) Video text detection based on filters and edge analysis. In: Proceedings of the ICME 2009, pp 514–517Google Scholar
  11. 11.
    Shivakumara P, Phan TQ, Tan CL (2009) A gradient difference based technique for video text detection. In: Proceedings of the ICDAR 2009, pp 156–160Google Scholar
  12. 12.
    Phan TQ, Shivakumara P, Tan CL (2009) A Laplacian method for video text detection. In: Proceedings of the ICDAR, pp 66–70Google Scholar
  13. 13.
    Shivakumara P, Huang W, Trung PQ, Tan CL (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recognit 43:2165–2185CrossRefGoogle Scholar
  14. 14.
    Park SH, Kim KI, Jung K, Kim HJ (1999) Locating car license plates using neural networks. IEEE Electron Lett 35:1475–1477CrossRefGoogle Scholar
  15. 15.
    Wu V, Manmatha R, Risean EM (1999) TextFinder: an automatic system to detect and recognize text in images. IEEE Trans Pattern Anal Mach Intell (PAMI) 21:1224–1229CrossRefGoogle Scholar
  16. 16.
    Sin B, Kim S, Cho B (2002) Locating characters in scene images using frequency features. Proc Int Conf Pattern Recognit (ICPR) 3:489–492Google Scholar
  17. 17.
    Mao W, Chung F, Lanm K, Siu W (2002) Hybrid Chinese/English text detection in images and video frames. Proc Int Conf Pattern Recognit (ICPR) 3:1015–1018Google Scholar
  18. 18.
    Jain AK, Zhong Y (1996) Page segmentation using texture analysis. Pattern Recognit 29:743–770CrossRefGoogle Scholar
  19. 19.
    Kim KI, Jung J, Park SH, Kim HJ (2001) Support vector machine-based text detection in digital video. Pattern Recognit 34:527–529CrossRefGoogle Scholar
  20. 20.
    Li H, Doermann D (2000) A video text detection system based on automated training. Proc Int Conf Pattern Recognit (ICPR) 223Google Scholar
  21. 21.
    Jung K (2001) Neural network-based text location in color images. Pattern Recognit Lett 22:1503–1515CrossRefzbMATHGoogle Scholar
  22. 22.
    Shivakumara P, Phan TQ, Tan CL (2009) A robust wavelet transform based technique for video text detection. In: Proceedings of the ICDAR, pp 1285–1289Google Scholar
  23. 23.
    Shivakumara P, Dutta A, Tan CL, Pal U (2010) A new wavelet-median-moment based method for multi-oriented video text detection. In: Proceedings of the DAS, pp 279–288Google Scholar
  24. 24.
    Shivakumara P, Phan TQ, Tan CL (2010) New Fourier-Statistical Features in RGB space for video text detection. IEEE Trans Circ Syst Video Technol (TCSVT) 20:1520–1532CrossRefGoogle Scholar
  25. 25.
    Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell (TPAMI) 33:412–419CrossRefGoogle Scholar
  26. 26.
    Ohya, Shio A, Akamatsu S (1994) Recognizing characters in scene images. IEEE Trans Pattern Anal Mach Intell (PAMI) 16:214–224CrossRefGoogle Scholar
  27. 27.
    Lee CM, Kankanhalli A (1995) Automatic extraction of characters in complex images. Int J Pattern Recognit Artif Intell (IJPRAI) 9:67–82CrossRefGoogle Scholar
  28. 28.
    Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28:1523–1535CrossRefGoogle Scholar
  29. 29.
    Kim HK (1996) Efficient automatic text location method and content-based indexing and structuring of video database. J Vis Commun Image Represent 7:336–344CrossRefGoogle Scholar
  30. 30.
    Lienhart R, Stuber F (1996) Automatic text recognition in digital videos. In: Proceedings of the SPIE, pp 180–188Google Scholar
  31. 31.
    Jain AK, Yu B (1998) Automatic text location in images and video frames. Pattern Recognit 31:2055–2076CrossRefGoogle Scholar
  32. 32.
    Phan TQ, Shivakumara P, Tan CL (2010) A skeleton-based method for multi-oriented text detection. In: Ninth IAPR international workshop on document analysis and systems (DAS10), pp 271–278Google Scholar
  33. 33.
    Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans PAMI 33(2):412–419CrossRefGoogle Scholar
  34. 34.
    Li X, Wang W, Jiang S, Huang Q, Gao W (2008) Fast effective text detection. In: Proceedings of the international conference on image processing (ICIP), pp 969–972Google Scholar
  35. 35.
    Anthimopoulus M, Gatos B, Pratikakis I (2008) A hybrid system for text detection in video frames. International Conf Doc Anal Syst (DAS) 1:286–292Google Scholar
  36. 36.
    Zhang X, Sun F (2011) Pulse coupled neural network edge based algorithm for image text locating. Tsinghua Sci Technol 16:22–30CrossRefGoogle Scholar
  37. 37.
    Shivakumara P, Sreedhar RP, Phan TQ, Shijian L, Tan CL (2012) Multi-oriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans CSVT 22:1227–1235Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Tong Lu
    • 1
  • Shivakumara Palaiahnakote
    • 2
  • Chew Lim Tan
    • 3
  • Wenyin Liu
    • 4
  1. 1.Department of Computer Science and TechnologyNanjing UniversityNanjingChina
  2. 2.Faculty of CSITUniversity of MalayaKuala LumpurMalaysia
  3. 3.National University of SingaporeSingaporeSingapore
  4. 4.Multimedia Software Engineering Research CenterCity University of Hong KongKowloon TongHong Kong SAR

Personalised recommendations