Advertisement

Text and Non-text Separation in Scanned Color-Official Documents

  • Amit Vijay NandedkarEmail author
  • Jayanta Mukherjee
  • Shamik Sural
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10481)

Abstract

Official documents consist of text and non-textual elements such as logo, stamp, and signature. Separation of these elements from a scanned document plays a significant role in document image retrieval, recognition, and verification. This paper presents a novel scheme to separate text and non-text elements of official documents using part-based features. In this work, we exploit the fact that intensity distributions of text and non-text elements in HSV color space are of distinctive nature. A new approach to compute part-based features using S and V channels is proposed. The classification of text and non-text components is performed based on majority voting scheme and K-approximate nearest neighbors. The knowledge base acquired during training is indexed using kD-tree indexing scheme. Subsequently, the method is extended for detection of logo, stamp, and signature. Experimental results show the effectiveness of the proposed approach.

Keywords

Text/non-text separation Graphics recognition Document recognition Color document image 

Notes

Acknowledgments

This work is partially sponsored by the Ministry of Communications & Information Technology, Govt. of India; Ref.: MCIT 11(19)/2010-HCC (TDIL) dt. 28-12-2010.

References

  1. 1.
    Tobacoo 800 dataset. http://www.umiacs.umd.edu/~zhugy/tobacco800.html. Accessed 7 Dec 2015
  2. 2.
    Ahmed, S., Liwicki, M., Dengel, A.: Extraction of text touching graphics using SURF. In: 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 349–353. IEEE (2012)Google Scholar
  3. 3.
    Ahmed, S., Malik, M.I., Liwicki, M., Dengel, A.: Signature segmentation from document images. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 425–429. IEEE (2012)Google Scholar
  4. 4.
    Ahmed, S., Shafait, F., Liwicki, M., Dengel, A.: A generic method for stamp segmentation using part-based features. In: 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 708–712. IEEE (2013)Google Scholar
  5. 5.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  6. 6.
    Chen, K., Wei, H., Liwicki, M., Hennebert, J., Ingold, R.: Robust text line segmentation for historical manuscript images using color and texture. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 2978–2983. IEEE (2014)Google Scholar
  7. 7.
    Dey, S., Mukherjee, J., Sural, S., Bhowmick, P.: Colored rubber stamp removal from document images. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds.) PReMI 2013. LNCS, vol. 8251, pp. 545–550. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-45062-4_75 CrossRefGoogle Scholar
  8. 8.
    Doermann, D., Tombre, K., et al.: Handbook of Document Image Processing and Recognition. Springer, London (2014). doi: 10.1007/978-0-85729-859-1 CrossRefzbMATHGoogle Scholar
  9. 9.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)zbMATHGoogle Scholar
  10. 10.
    Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)CrossRefGoogle Scholar
  11. 11.
    Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recogn. 29(5), 743–770 (1996)CrossRefGoogle Scholar
  12. 12.
    Jain, R., Doermann, D.: Logo retrieval in document images. In: 10th IAPR International Workshop on Document Analysis Systems, pp. 135–139. IEEE (2012)Google Scholar
  13. 13.
    Le, V.P., Nayef, N., Visani, M., Ogier, J.M., De Tran, C.: Document retrieval based on logo spotting using key-point matching. In: 22nd International Conference on Pattern Recognition (ICPR), pp. 3056–3061. IEEE (2014)Google Scholar
  14. 14.
    Maderlechner, G., Suda, P., Brückner, T.: Classification of document by form and content. Pattern Recogn. Lett. 18, 1225–1231 (1997)CrossRefGoogle Scholar
  15. 15.
    Mandal, R., Roy, P.P., Pal, U.: Signature segmentation from machine printed documents using conditional random field. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1170–1174. IEEE (2011)Google Scholar
  16. 16.
    Micenková, B., van Beusekom, J.: Stamp detection in color document images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1125–1129. IEEE (2011)Google Scholar
  17. 17.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. Int. Conf. Comput. Vis. Theory Appl. (VISAPP) 2, 331–340 (2009)Google Scholar
  18. 18.
    Nandedkar, A.V., Mukhopadhyay, J., Sural, S.: Text-graphics separation to detect logo and stamp from color document images: a spectral approach. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 571–575. IEEE (2015)Google Scholar
  19. 19.
    Qiao, Y.L., Li, M., Lu, Z.M., Sun, S.H.: Gabor filter based text extraction from digital document images. In: International Conference on Intelligent Information Hiding and Multimedia Signal Processing, (IIH-MSP), pp. 297–300. IEEE (2006)Google Scholar
  20. 20.
    Roy, P.P., Pal, U., Lladós, J.: Document seal detection using ght and character proximity graphs. Pattern Recogn. 44(6), 1282–1295 (2011)CrossRefGoogle Scholar
  21. 21.
    Rusiñol, M., Lladós, J.: Efficient logo retrieval through hashing shape context descriptors. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 215–222. ACM (2010)Google Scholar
  22. 22.
    Wahl, F.M., Wong, K.Y., Casey, R.G.: Block segmentation and text extraction in mixed text/image documents. Comput. Graph. Image Process. 20(4), 375–390 (1982)CrossRefGoogle Scholar
  23. 23.
    Wang, H., Chen, Y.: Logo detection in document images based on boundary extension of feature rectangles. In: 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1335–1339. IEEE (2009)Google Scholar
  24. 24.
    Zhu, G., Doermann, D.: Automatic document logo detection. In: 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 864–868. IEEE (2007)Google Scholar
  25. 25.
    Zhu, G., Zheng, Y., Doermann, D., Jaeger, S.: Signature detection and matching for document image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 2015–2031 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Amit Vijay Nandedkar
    • 1
    Email author
  • Jayanta Mukherjee
    • 1
  • Shamik Sural
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology, KharagpurKharagpurIndia

Personalised recommendations