Skip to main content

A Three-Layer Approach for Overlay Text Extraction in Video Stream

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 584))

Abstract

Overlaid texts are annotated text on video frames embedded externally for providing additional information to viewer of video sequences. The externally embedded texts can be used for auto-indexing and searching of video files in a video library using contextual contents inside video files. In this paper, we proposed a novel algorithm to detect and extract the overlaid text in digital video which allows users to get a much deeper understanding of video content. The proposed algorithm uses SVM as machine learning approach to filter/extract text more accurately. It uses multi-resolution processing algorithm due to which the proposed algorithm is able to extract embedded text of different font size from same video frame. Text detection from video sequences enables us to auto-indexing of video based on text embedded on video frames. Embedded texts enable deaf and hard-of-hearing users to watch videos. It is also useful for the people, who have hearing impairments from understanding the content of video. It also helps to those kinds of people who want to watch video in sound-sensitive environments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: 11th IEEE International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  2. Wolf, C., Jolion, J.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)

    Article  Google Scholar 

  3. Li, H., Doermann, D., Kia, O.: Automatic text detection and tracking in digital video. IEEE Trans. IP 9(1), 147–156 (2000)

    Google Scholar 

  4. Huang, X., Ma, H.: Automatic detection and localization of natural scene text in video. In: Proceedings of the 20th IEEE International Conference on Pattern Recognition, pp. 3216–3219, Aug 2010

    Google Scholar 

  5. Zhao, X., Lin, K.H., Fu, Y., Hu, Y., Liu, Y., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE Trans. Image Process. 20(3), 790–799 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18(2), 401–411 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ekin, A.: Information based overlaid text detection by classifier fusion. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. II-753–II-756 (2006)

    Google Scholar 

  8. Shivakumara, P., Phan, T.Q., Tan, C.L.: A laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011)

    Article  Google Scholar 

  9. Li, Z., Liu, G., Qian, X., Guo, D., Jiang, H.: Effective and efficient video text extraction using key text points. IET Image Process. 5(8), 671–683 (2011)

    Article  MathSciNet  Google Scholar 

  10. Ye, Q., Huang, Q.: A New text detection algorithm in images/video frames PCM. LNCS 3332, 858–865 (2004)

    Google Scholar 

  11. Hua, X., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. IEEE Int. Conf. Image Process. (ICIP) 2, 397–400 (2002)

    Google Scholar 

  12. Hua, X.-S., Chert, X.-R., Wenyin, L., Zhang, H.-J.: Automatic location of text in video frames. In: Proceedings of the 2001 ACM Workshops on Multimedia, 24–27 Sept 2001

    Google Scholar 

  13. Winger, L.L., Robinson, J.A., Jernigan, M.E.: Low-complexity character extraction in low-contrast scene images. Int. J. Pattern Recognit. Artif. Intell. 14(2), 113–135 (2000)

    Article  Google Scholar 

  14. Shivakumara, P., Dutta, A., Phan, T.Q., Tan, C.L., Pal, U.: A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recognit. 44, 1671–1683 (2011)

    Google Scholar 

  15. Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Multimed. Tools Appl. 69(1), 217–245 (2014)

    Article  Google Scholar 

  16. Liu, X., Wang, W.: Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans. Multimed. 14(2), 482–489 (2012)

    Article  MathSciNet  Google Scholar 

  17. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transforms. In: IEEE Conference on Computer Vision and Pattern Recognition, San Francisco (2010)

    Google Scholar 

  18. Jung, C., Liu, Q., Kim, J.: A stroke filter and its application to text localization. Pattern Recogn. Lett. 30(2), 114–122 (2009)

    Article  Google Scholar 

  19. Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. Circuits Syst. Video 22(8), 1227–1235 (2012)

    Article  Google Scholar 

  20. Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  21. Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the 18th IEEE International Conference on Image Processing, pp. 2609–2612, Sept 2011

    Google Scholar 

  22. Zhao, M., Li, S., Kwok, J.: Text detection in images using sparse representation with discriminative dictionaries. Image Vis. Comput. 28(12), 1590–1599 (2010)

    Article  Google Scholar 

  23. Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. Anthimopoulos, M., Gatos, B., Pratikakis, I.: A two-stage scheme for text detection in video images. Image Vis. Comput. 28(9), 1413–1426 (2010)

    Article  Google Scholar 

  25. Zhuge, Y.Z., Lu, H.C.: Robust video text detection with morphological filtering enhanced MSER. J. Comput. Sci. Technol. 30(2), 353–363 (2015)

    Article  Google Scholar 

  26. Huang, X., Ma, H., Ling, C.X., Gao, G.: Detecting both superimposed and scene text with multiple languages and multiple alignments in video. Springer Science + Business Media, LLC (2012)

    Google Scholar 

  27. Anoop, K., Gangan, M.P., Lajish, V.L.: Advances in Signal Processing and Intelligent Recognition Systems, Advances in Intelligent Systems and Computing. Springer International Publishing, Switzerland (2016)

    Google Scholar 

  28. Lee, S., Ahn, J., Lee, Y., Jo, K.: Beginning Frame and Edge Based Name Text Localization in News Interview Videos. ICIC 2016, Springer International Publishing, Switzerland, Part III, pp. 583–594 (2016)

    Google Scholar 

  29. Yi, J., Peng, Y., Xiao, J.: Color-based clustering for text detection and extraction in image. ACM MM 847–850 (2007)

    Google Scholar 

  30. Anthimopoulos, M., Gatos, B., Pratikakis, I.: Multiresolution text detection in video frames. In: International Conference on Computer Vision Theory and Applications, pp. 161–166 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lalita Kumari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumari, L., Dey, V., Raheja, J.L. (2018). A Three-Layer Approach for Overlay Text Extraction in Video Stream. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 584. Springer, Singapore. https://doi.org/10.1007/978-981-10-5699-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5699-4_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5698-7

  • Online ISBN: 978-981-10-5699-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics