A Three-Layer Approach for Overlay Text Extraction in Video Stream

Kumari, Lalita; Dey, Vidyut; Raheja, J. L.

doi:10.1007/978-981-10-5699-4_9

A Three-Layer Approach for Overlay Text Extraction in Video Stream

Lalita Kumari¹⁹,
Vidyut Dey²⁰ &
J. L. Raheja²¹

Conference paper
First Online: 25 November 2017

1228 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 584))

Abstract

Overlaid texts are annotated text on video frames embedded externally for providing additional information to viewer of video sequences. The externally embedded texts can be used for auto-indexing and searching of video files in a video library using contextual contents inside video files. In this paper, we proposed a novel algorithm to detect and extract the overlaid text in digital video which allows users to get a much deeper understanding of video content. The proposed algorithm uses SVM as machine learning approach to filter/extract text more accurately. It uses multi-resolution processing algorithm due to which the proposed algorithm is able to extract embedded text of different font size from same video frame. Text detection from video sequences enables us to auto-indexing of video based on text embedded on video frames. Embedded texts enable deaf and hard-of-hearing users to watch videos. It is also useful for the people, who have hearing impairments from understanding the content of video. It also helps to those kinds of people who want to watch video in sound-sensitive environments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: 11th IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Wolf, C., Jolion, J.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)
Article Google Scholar
Li, H., Doermann, D., Kia, O.: Automatic text detection and tracking in digital video. IEEE Trans. IP 9(1), 147–156 (2000)
Google Scholar
Huang, X., Ma, H.: Automatic detection and localization of natural scene text in video. In: Proceedings of the 20th IEEE International Conference on Pattern Recognition, pp. 3216–3219, Aug 2010
Google Scholar
Zhao, X., Lin, K.H., Fu, Y., Hu, Y., Liu, Y., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE Trans. Image Process. 20(3), 790–799 (2011)
Article MathSciNet MATH Google Scholar
Kim, W., Kim, C.: A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18(2), 401–411 (2009)
Article MathSciNet MATH Google Scholar
Ekin, A.: Information based overlaid text detection by classifier fusion. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. II-753–II-756 (2006)
Google Scholar
Shivakumara, P., Phan, T.Q., Tan, C.L.: A laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011)
Article Google Scholar
Li, Z., Liu, G., Qian, X., Guo, D., Jiang, H.: Effective and efficient video text extraction using key text points. IET Image Process. 5(8), 671–683 (2011)
Article MathSciNet Google Scholar
Ye, Q., Huang, Q.: A New text detection algorithm in images/video frames PCM. LNCS 3332, 858–865 (2004)
Google Scholar
Hua, X., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. IEEE Int. Conf. Image Process. (ICIP) 2, 397–400 (2002)
Google Scholar
Hua, X.-S., Chert, X.-R., Wenyin, L., Zhang, H.-J.: Automatic location of text in video frames. In: Proceedings of the 2001 ACM Workshops on Multimedia, 24–27 Sept 2001
Google Scholar
Winger, L.L., Robinson, J.A., Jernigan, M.E.: Low-complexity character extraction in low-contrast scene images. Int. J. Pattern Recognit. Artif. Intell. 14(2), 113–135 (2000)
Article Google Scholar
Shivakumara, P., Dutta, A., Phan, T.Q., Tan, C.L., Pal, U.: A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recognit. 44, 1671–1683 (2011)
Google Scholar
Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Multimed. Tools Appl. 69(1), 217–245 (2014)
Article Google Scholar
Liu, X., Wang, W.: Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans. Multimed. 14(2), 482–489 (2012)
Article MathSciNet Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transforms. In: IEEE Conference on Computer Vision and Pattern Recognition, San Francisco (2010)
Google Scholar
Jung, C., Liu, Q., Kim, J.: A stroke filter and its application to text localization. Pattern Recogn. Lett. 30(2), 114–122 (2009)
Article Google Scholar
Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. Circuits Syst. Video 22(8), 1227–1235 (2012)
Article Google Scholar
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)
Article MathSciNet MATH Google Scholar
Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the 18th IEEE International Conference on Image Processing, pp. 2609–2612, Sept 2011
Google Scholar
Zhao, M., Li, S., Kwok, J.: Text detection in images using sparse representation with discriminative dictionaries. Image Vis. Comput. 28(12), 1590–1599 (2010)
Article Google Scholar
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011)
Article MathSciNet MATH Google Scholar
Anthimopoulos, M., Gatos, B., Pratikakis, I.: A two-stage scheme for text detection in video images. Image Vis. Comput. 28(9), 1413–1426 (2010)
Article Google Scholar
Zhuge, Y.Z., Lu, H.C.: Robust video text detection with morphological filtering enhanced MSER. J. Comput. Sci. Technol. 30(2), 353–363 (2015)
Article Google Scholar
Huang, X., Ma, H., Ling, C.X., Gao, G.: Detecting both superimposed and scene text with multiple languages and multiple alignments in video. Springer Science + Business Media, LLC (2012)
Google Scholar
Anoop, K., Gangan, M.P., Lajish, V.L.: Advances in Signal Processing and Intelligent Recognition Systems, Advances in Intelligent Systems and Computing. Springer International Publishing, Switzerland (2016)
Google Scholar
Lee, S., Ahn, J., Lee, Y., Jo, K.: Beginning Frame and Edge Based Name Text Localization in News Interview Videos. ICIC 2016, Springer International Publishing, Switzerland, Part III, pp. 583–594 (2016)
Google Scholar
Yi, J., Peng, Y., Xiao, J.: Color-based clustering for text detection and extraction in image. ACM MM 847–850 (2007)
Google Scholar
Anthimopoulos, M., Gatos, B., Pratikakis, I.: Multiresolution text detection in video frames. In: International Conference on Computer Vision Theory and Applications, pp. 161–166 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, NIT Agartala, Jirania, India
Lalita Kumari
Department of Production Engineering, NIT Agartala, Jirania, India
Vidyut Dey
Digital System Group, CSIR/CEERI, Pilani, Rajasthan, India
J. L. Raheja

Authors

Lalita Kumari
View author publications
You can also search for this author in PubMed Google Scholar
Vidyut Dey
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Raheja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lalita Kumari .

Editor information

Editors and Affiliations

Department of Applied Science and Engineering, IIT Roorkee, Saharanpur, India
Millie Pant
Department of Physics, Amity School of Applied Sciences, Amity University Rajasthan, Jaipur, Rajasthan, India
Kanad Ray
Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Rajasthan, Jaipur, Rajasthan, India
Tarun K. Sharma
Department of Electronics and Communication Engineering, SEEC, Manipal University Jaipur, Jaipur, Rajasthan, India
Sanyog Rawat
Surface Characterization Group, NIMS, Nano Characterization Unit, Advanced Key Technologies Division, Tsukuba, Ibaraki, Japan
Anirban Bandyopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumari, L., Dey, V., Raheja, J.L. (2018). A Three-Layer Approach for Overlay Text Extraction in Video Stream. In: Pant, M., Ray, K., Sharma, T., Rawat, S., Bandyopadhyay, A. (eds) Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, vol 584. Springer, Singapore. https://doi.org/10.1007/978-981-10-5699-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-10-5699-4_9
Published: 25 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5698-7
Online ISBN: 978-981-10-5699-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics