Skip to main content

An Integral Image Based Text Extraction Technique from Document Images by Multilevel Thresholding Using Differential Evolution

  • Chapter
  • First Online:
Methodologies and Application Issues of Contemporary Computing Framework

Abstract

This paper presents a multilevel image segmentation technique for extraction of texts from document images at a faster rate. Rectangular sum-table based concept (known as integral image) has been used here to find the local threshold value of images. An integral image obtained for a fixed window size is used to calculate threshold value by the proposed probability-based objective function. Threshold value calculations over small window size generate the result quickly compared to standard algorithms like Sauvola, Niblack, and Vu approaches. The achieved local threshold value is further optimized globally by the Differential Evolution (DE) algorithm to get multiple thresholds. DE provides fast and accurate convergence towards the optimal solution as compared to other well-known optimized algorithms like Particle Swarm Optimization (PSO) and Genetic Algorithm (GA). The proposed technique is applied to different types of degraded document images. The outcomes of the proposed approach are compared both quantitatively and qualitatively with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S. Abdel-Khalek, A.B. Ishak, O.A. Omer, A.S. Obada, A two-dimensional image segmentation method based on genetic algorithm and entropy. Optik-Int. J. Light Electron Opt. 131, 414–422 (2017)

    Article  Google Scholar 

  2. M.N. Alam, Particle swarm optimization: algorithm and its codes in matlab (2016)

    Google Scholar 

  3. H.V.H. Ayala, F.M. dos Santos, V.C. Mariani, L. dos Santos Coelho, Image thresholding segmentation based on a novel beta differential evolution approach. Exp. Syst. Appl. 42(4), 2136–2142 (2015)

    Article  Google Scholar 

  4. D. Bradley, G. Roth, Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007)

    Article  Google Scholar 

  5. L. Cao, P. Bao, Z. Shi, The strongest schema learning ga and its application to multilevel thresholding. Image Vis. Comput. 26(5), 716–724 (2008)

    Article  Google Scholar 

  6. A. Chander, A. Chatterjee, P. Siarry, A new social and momentum component adaptive PSO algorithm for image segmentation. Exp. Syst. Appl. 38(5), 4998–5004 (2011)

    Article  Google Scholar 

  7. Y.L. Chen, B.F. Wu, A multi-plane approach for text segmentation of complex document images. Pattern Recognit. 42(7), 1419–1444 (2009)

    Article  Google Scholar 

  8. M. Cheriet, J.N. Said, C.Y. Suen, A recursive thresholding technique for image segmentation. IEEE Trans. Image Process. 7(6), 918–921 (1998)

    Article  Google Scholar 

  9. J.L. Fisher, S.C. Hinds, D.P. D’Amato, A rule-based system for document image segmentation, in Proceedings, 10th International Conference on Pattern Recognition, vol. 1 (IEEE, 1990), pp. 567–572

    Google Scholar 

  10. L.A. Fletcher, R. Kasturi, A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)

    Article  Google Scholar 

  11. B. Gatos, I. Pratikakis, S.J. Perantonis, Adaptive degraded document image binarization. Pattern Recognit. 39(3), 317–327 (2006)

    Article  Google Scholar 

  12. M.H. Horng, R.J. Liou, Multilevel minimum cross entropy threshold selection based on the firefly algorithm. Exp. Syst. Appl. 38(12), 14805–14811 (2011)

    Article  Google Scholar 

  13. D. Karaboga, B. Gorkemli, C. Ozturk, N. Karaboga, A comprehensive survey: artificial bee colony (abc) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2014)

    Article  Google Scholar 

  14. Y. Liu, S.N. Srihari, Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 540–544 (1997)

    Article  Google Scholar 

  15. M. Ma, J. Liang, M. Guo, Y. Fan, Y. Yin, Sar image segmentation based on artificial bee colony algorithm. Appl. Soft Comput. 11(8), 5205–5214 (2011)

    Article  Google Scholar 

  16. M. Naidu, P.R. Kumar, K. Chiranjeevi, Shannon and fuzzy entropy based evolutionary image thresholding for image segmentation. Alex. Eng. J. (2017)

    Google Scholar 

  17. W. Niblack, An introduction to digital image processing. Strandberg Publishing Company (1985)

    Google Scholar 

  18. L. O’Gorman, R. Kasturi, Document image analysis, vol. 39 (IEEE Computer Society Press Los Alamitos, 1995)

    Google Scholar 

  19. S. Pare, A.K. Bhandari, A. Kumar, G.K. Singh, S. Khare, Satellite image segmentation based on different objective functions using genetic algorithm: a comparative study, in 2015 IEEE International Conference on Digital Signal Processing (DSP) (IEEE, 2015), pp. 730–734

    Google Scholar 

  20. J. Parker, Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 813–819 (1991)

    Article  Google Scholar 

  21. S. Sarkar, S. Das, S.S. Chaudhuri, A multilevel color image thresholding scheme based on minimum cross entropy and differential evolution. Pattern Recognit. Lett. 54, 27–35 (2015)

    Article  Google Scholar 

  22. S. Sarkar, S. Paul, R. Burman, S. Das, S.S. Chaudhuri, A fuzzy entropy based multi-level image thresholding using differential evolution, in International Conference on Swarm, Evolutionary, and Memetic Computing (Springer, 2014), pp. 386–395

    Google Scholar 

  23. J. Sauvola, M. Pietikäinen, Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)

    Article  Google Scholar 

  24. F. Shafait, D. Keysers, T.M. Breuel, Efficient implementation of local adaptive thresholding techniques using integral images. DRR 6815, 681510 (2008)

    Google Scholar 

  25. F. Shafait, J. Van Beusekom, D. Keysers, T.M. Breuel, Page frame detection for marginal noise removal from scanned documents, in Scandinavian Conference on Image Analysis (Springer, 2007), pp. 651–660

    Google Scholar 

  26. K. Sobottka, H. Kronenberg, T. Perroud, H. Bunke, Text extraction from colored book and journal covers. Int. J. Doc. Anal. Recognit. 2(4), 163–176 (2000)

    Google Scholar 

  27. R. Storn, K. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Opt. 11(4), 341–359 (1997)

    Article  MathSciNet  Google Scholar 

  28. C.M. Tsai, H.J. Lee, Binarization of color document images via luminance and saturation color features. IEEE Trans. Image Process. 11(4), 434–451 (2002)

    Article  Google Scholar 

  29. P. Viola, M.J. Jones, Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)

    Article  Google Scholar 

  30. H.N. Vu, T.A. Tran, I.S. Na, S.H. Kim, Automatic extraction of text regions from document images by multilevel thresholding and k-means clustering, in 2015 IEEE/ACIS 14th International Conference on Computer and Information Science (ICIS) (IEEE, 2015), pp. 329–334

    Google Scholar 

  31. Y. Zhong, K. Karu, A.K. Jain, Locating text in complex color images. Pattern Recognit. 28(10), 1523–1535 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rupak Chakraborty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chakraborty, R., Sushil, R., Garg, M.L. (2018). An Integral Image Based Text Extraction Technique from Document Images by Multilevel Thresholding Using Differential Evolution. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Methodologies and Application Issues of Contemporary Computing Framework. Springer, Singapore. https://doi.org/10.1007/978-981-13-2345-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2345-4_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2344-7

  • Online ISBN: 978-981-13-2345-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics