Skip to main content

A Deep Convolution Neural Network Based Model for Enhancing Text Video Frames for Detection

  • Conference paper
  • First Online:
  • 1831 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Abstract

The main causes of getting poor results in video text detection is low quality of frames and which is affected by different factors like de-blurring, complex background, illumination etc. are few of the challenges encountered in image enhancement. This paper proposes a technique for enhancing image quality for better human perception along with text detection for video frames. An approach based on set of smart and effective CNN denoisers are designed and trained to denoise an image by adopting variable splitting technique, the robust denoisers are plugged into model based optimization methods with HQS framework to handle image deblurring and super resolution problems. Further, for detecting text from denoised frames, we have used state-of-art methods such as MSER (Maximally Extremal Regions) and SWT (Stroke Width Transform) and experiments are done on our database, ICDAR and YVT database to demonstrate our proposed work in terms of precision, recall and F-measure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sato, T., Kanade, T., Hughes, E.K., Smith, M.A.: Video OCR for digital news archive. In: Proceedings of IEEE Workshop on Content Based Access of Image and Video Databases, Bombay, India, pp. 52–60 (1998)

    Google Scholar 

  2. Li, H., Kia, O., Doermann, D.: Text enhancement in digital video. In: Proceedings of SPIE, Document Recognition IV, pp. 1–8 (1999)

    Google Scholar 

  3. Li, H., Doerman, D., Kia, O.: Automatic text detection and tracking in digital video. IEEE Trans. Image Process. 9, 147–156 (2000)

    Article  Google Scholar 

  4. Li, H., Doermann, D.: A video text detection system based on automated training. In: Proceedings of IEEE International Conference on Pattern Recognition, pp. 223–226 (2000)

    Google Scholar 

  5. Chen, D., Odobez, J., Bourlard, H.: Text segmentation and recognition in complex background based on Markov random field. In: Proceedings of International Conference on Pattern Recognition, Quebec, Canada, vol. 4, pp. 227–230 (2002)

    Google Scholar 

  6. Rainer, L., Stuber, F.: Automatic text recognition in digital videos. Technical Report, University of Mannheim (1995)

    Google Scholar 

  7. Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete with BM3D? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2392–2399 (2012)

    Google Scholar 

  8. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)

    Article  Google Scholar 

  9. Dong, W., Zhang, L., Shi, G., Li, X.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolution neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)

    Google Scholar 

  11. Jain, A.K., Yu, B.: Automatic text location in images and video frames. Pattern Recogn. 31(12), 2055–2076 (1998)

    Article  Google Scholar 

  12. Petter, M., Fragoso, V., Turk, M., Baur, C.: Automatic text detection for mobile augmented reality translation. In: Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV 2011), pp. 48–55 (2011)

    Google Scholar 

  13. Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)

    Article  Google Scholar 

  14. Shivakumara, P., Phan, T.Q., Lu, S., Tan, C.L.: Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans. Circ. Syst. Video Technol. 23(10), 1729–1739 (2013)

    Article  Google Scholar 

  15. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, CVPR 2010, pp. 2963–2970 (2010)

    Google Scholar 

  16. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, vol. 1, pp. 384–393 (2002)

    Google Scholar 

  17. Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolution neural networks. In: Proceedings of International Conference on Pattern Recognition (ICPR 2012), pp. 3304–3308 (2012)

    Google Scholar 

  18. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), pp. 512–528 (2014)

    Google Scholar 

  19. Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. PAMI 36(5), 970–983 (2014)

    Article  Google Scholar 

  20. Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: Computer Vision and Pattern Recognition, CVPR (2017)

    Google Scholar 

  21. Andrews, H.C., Hunt, B.R.: Digital Image Restoration. Prentice-Hall Signal Processing Series, vol. 1. Prentice-Hall, Englewood Cliffs (1977)

    Google Scholar 

  22. Campisi, P., Egiazarian, K.: Blind Image Deconvolution: Theory and Applications. CRC Press, New York (2016)

    Google Scholar 

  23. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2010)

    Google Scholar 

  24. Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B: Robust text detection in natural scene images with edge-enhanced maximally stable extremal regions. In: 18th IEEE International Conference Image Processing (ICIP), pp. 2609–2612 (2011)

    Google Scholar 

Download references

Acknowledgment

The work carried out in this paper was supported by High Performance Computing Lab, under UPE Grant Department of Studies in Computer Science, University of Mysore, Mysore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Sunil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sunil, C., Chethan, H.K., Raghunandan, K.S., Hemantha Kumar, G. (2018). A Deep Convolution Neural Network Based Model for Enhancing Text Video Frames for Detection. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76348-4_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76347-7

  • Online ISBN: 978-3-319-76348-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics