Advertisement

Recognition System to Separate Text Graphics from Indian Newspaper

  • Shantanu Jana
  • Nibaran Das
  • Ram Sarkar
  • Mita Nasipuri
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 225)

Abstract

Identification of graphics from newspaper pages and then their separation from text is a challenging task. Very few works have been reported in this field. In general, newspapers are printed in low quality papers which have a tendency to change color with time. This color change generates noise that adds with time to the document. In this work we have chosen several features to distinguish graphics from text as well as tried to reduce the noise. At first minimum bounding box around each object has been identified by connected component analysis of binary image. Each object was cropped thereafter and passed through geometric feature extraction system. Then we have done two different frequency analysis of each object. Thus we have collected both spatial and frequency domain features from objects which are used for training and testing purpose using different classifiers. We have applied the techniques on Indian newspapers written in roman script and got satisfactory results over that.

Keywords

Edge detection Connected component analysis Text graphics separation FFT 2D-DWT Water marking News paper segmentation 

Notes

Acknowledgements

The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The current work reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India.

References

  1. 1.
    Garg, R., Bansal, A., Chaudhury, S., Roy, S.D.: Text graphic separation in Indian newspapers. In: Proceedings of 4th International Work Multiling. OCR-MOCR’13, August 24, p. 1 (2013)Google Scholar
  2. 2.
    Roy, P.P., Vazquez, E., Lladós, J., Baldrich, R., Umapada, P.: A System to Segment Text and Symbols from Color Maps. In: 7th International Workshop, GREC 2007, 5046, pp. 245–256 (2008).  https://doi.org/10.1007/978-3-540-88188-9
  3. 3.
    Mollah, A.F., Basu, S., Nasipuri, M., Basu, D.K.: Text/Graphics Separation for Business Card Devices, pp. 263–270 (2009)Google Scholar
  4. 4.
    Rege, P.P., Chandrakar, C.A.: Text-Image Separation in Document Images Using Boundary/Perimeter Detection (2011)Google Scholar
  5. 5.
    Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text Extraction in Complex Color Documents, vol. 35, pp. 1743–1758 (2002)Google Scholar
  6. 6.
    Garg, R., Hassan, E., Chaudhury, S., Gopal, M.: A CRF Based Scheme for Overlapping Multi-Colored Text Graphics Separation,” In: 2011 International Conference on Document Analysis and Recognition, no. c (2011)Google Scholar
  7. 7.
    Cao, R., Tan, C.L.: Separation of Overlapping Text from Graphics, pp. 44–48 (2001)Google Scholar
  8. 8.
    Science, C., Kent, L., Rd, R., Abe, N.: A Clustering-Based Approach to the Separation of Text Strings from Mixed Text Graphics Documents, pp. 706–710 (1996)Google Scholar
  9. 9.
    Vieux, R., Domenger, J., Talence, F.: Hierarchical Clustering Model for Pixel-Based Classification of Document Images, no. Icpr, pp. 290–293 (2012)Google Scholar
  10. 10.
    Chinnasarn, K.: Removing Salt-and-Pepper Noise in Text/Graphics Images, IEEE, pp. 459–462Google Scholar
  11. 11.
    Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image Analysis Using Mathemetical Morphology, IEEE Trans. Pattern Anal. Mach. Intel. (4), pp. 532–550 (1987)Google Scholar
  12. 12.
    Kowalczyk, M., Koza, P., Kupidura, P., Marciniak, J.: Application of Mathematical Morphology Operations for Simplification and Improvement of Correlation of Images in Close-Range Photogrammetry, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVII, part B5. Beijing (2008)Google Scholar
  13. 13.
    Verma, R., Ali, J.: A Comparative Study of Various Types of Image Noise and Efficient Noise Removal Techniques, Int. J. Adv. Res. Comput. Sci. Soft. Eng. 3(10), 617–622 (2013)Google Scholar
  14. 14.
    Kumar, M., Saxena, R.: Algorithm and Technique on Various Edge Detection: A Survey, vol. 4, no. 3, pp. 65–75 (2013)Google Scholar
  15. 15.
    To, E.: The, A DWT, DCT and SVD Based Watermarking, vol. 4, no. 2, pp. 21–32 (2013)Google Scholar
  16. 16.
    Jiansheng, M., Sukang, L., Xiaomei, T.: A Digital Watermarking Algorithm Based on DCT and DWT, In: Proceedings of the 2009 International Symposium on Web Information Systems and Applications (WISA’09) Nanchang, P. R. China, May 22–24, vol. 8, no. 2, pp. 104–107 (2009)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Shantanu Jana
    • 1
  • Nibaran Das
    • 1
  • Ram Sarkar
    • 1
  • Mita Nasipuri
    • 1
  1. 1.CMATER Laboratory, Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations