Skip to main content

Region-Based Caption Text Extraction

  • Chapter
  • First Online:
Analysis, Retrieval and Delivery of Multimedia Content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 158))

Abstract

This chapter presents a method for caption text detection. The proposed method will be included in a generic indexing system dealing with other semantic concepts which are to be automatically detected as well. To have a coherent detection system, the various object detection algorithms use a common image description, a hierarchical region-based image model. The proposed method takes advantage of texture and geometric features to detect the caption text. Texture features are estimated using wavelet analysis and mainly applied for text candidate spotting. In turn, text characteristics verification relies on geometric features, which are estimated exploiting the region-based image model. Analysis of the region hierarchy provides the final caption text objects. The final step of consistency analysis for output is performed by a binarization algorithm that robustly estimates the thresholds on the caption text area of support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.code.google.com/p/tesseract-ocr/

  2. 2.

    All images used in this chapter belong to TVC, Television de Catalunya, and are copyright protected. These key-frames have been provided by TVC with the only goal of research under the framework of the i3media project.

References

  1. Assfalg J, Bertini M, Colombo C, Del Bimbo C (2001) Extracting semantic information from news and sport video. In: Proceedings of the 2nd ISPA, pp 4–11

    Google Scholar 

  2. Crandall D, Antani S, Kasturi R (2002) Extraction of special effects caption text events from digital video. Int J Doc Anal Recog 2:138–157

    Google Scholar 

  3. Jung K, Kim K, Jain AK (2004) Text information extraction in images and video:a survey. Pattern Recog 37:977–997

    Article  Google Scholar 

  4. Vilaplana V, Marqués F, Salembier P (2008) Binary partition trees for object detection. IEEE Trans Image Process 17(11):2201–2216

    Google Scholar 

  5. Zhong Y, Zhang H, Jain AK (2000) Automatic caption localization in compressed video. IEEE Trans PAMI 22(4):385–393

    Google Scholar 

  6. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–155

    Google Scholar 

  7. Tekinalp S, Alatan AA (2003) Utilization of texture, contrast and color homogeneity for detecting and recognizing text from video frames. In: IEEE ICIP 2003, Barcelona, Spain

    Google Scholar 

  8. Retornaz T, Marcotegui B (2007) Scene text localization based on the ultimate opening. Proc ISMM 1:177–188

    Google Scholar 

  9. Salembier P, Oliveras A, Garrido L (1998) Anti-extensive connected operators for image and sequence processing. IEEE Trans Image Process 7(4):555–570

    Google Scholar 

  10. Leon M, Mallo S, Gasull A (2005) A tree structured-based caption text detection approach. In: Proceedings of 5th IASTED VIIP, pp 220–225

    Google Scholar 

  11. Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation and information retrieval. IEEE Trans Image Process 9(4):561–576

    Article  Google Scholar 

  12. Vilaplana V, Marques F, Leon M, Gasull A (2010) Object detection and segmentation on a hierarchical region-based image representation. In: Proceedings of the ICIP-10, IEEE international conference on image processing, pp 3393–3396, Hong Kong, China

    Google Scholar 

  13. Leon M, Vilaplana V, Gasull A, Marques F (2009) Caption text extraction for indexing purposes using a hierarchical region-based image model. In: IEEE ICIP 2009, El Cairo, Egypt

    Google Scholar 

  14. Rosin PL (1999) Measuring rectangularity. Mach. Vis. Appl. 11(4):191–196

    Google Scholar 

  15. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Google Scholar 

Download references

Acknowledgments

This work was partially founded by the Catalan Broadcasting Corporation (CCMA) and Mediapro through the Spanish project CENIT-2007-1012 i3media and TEC2007-66858/TCM PROVEC of the Spanish Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miriam Leon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Leon, M., Vilaplana, V., Gasull, A., Marques, F. (2013). Region-Based Caption Text Extraction. In: Adami, N., Cavallaro, A., Leonardi, R., Migliorati, P. (eds) Analysis, Retrieval and Delivery of Multimedia Content. Lecture Notes in Electrical Engineering, vol 158. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3831-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3831-1_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-3830-4

  • Online ISBN: 978-1-4614-3831-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics