Recognition System to Separate Text Graphics from Indian Newspaper

Jana, Shantanu; Das, Nibaran; Sarkar, Ram; Nasipuri, Mita

doi:10.1007/978-981-10-7814-9_14

Shantanu Jana⁴,
Nibaran Das⁴,
Ram Sarkar⁴ &
…
Mita Nasipuri⁴

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 225))

Included in the following conference series:

International Conference on Frontiers in Optimization: Theory and Applications

719 Accesses
1 Citations

Abstract

Identification of graphics from newspaper pages and then their separation from text is a challenging task. Very few works have been reported in this field. In general, newspapers are printed in low quality papers which have a tendency to change color with time. This color change generates noise that adds with time to the document. In this work we have chosen several features to distinguish graphics from text as well as tried to reduce the noise. At first minimum bounding box around each object has been identified by connected component analysis of binary image. Each object was cropped thereafter and passed through geometric feature extraction system. Then we have done two different frequency analysis of each object. Thus we have collected both spatial and frequency domain features from objects which are used for training and testing purpose using different classifiers. We have applied the techniques on Indian newspapers written in roman script and got satisfactory results over that.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Garg, R., Bansal, A., Chaudhury, S., Roy, S.D.: Text graphic separation in Indian newspapers. In: Proceedings of 4th International Work Multiling. OCR-MOCR’13, August 24, p. 1 (2013)
Google Scholar
Roy, P.P., Vazquez, E., Lladós, J., Baldrich, R., Umapada, P.: A System to Segment Text and Symbols from Color Maps. In: 7th International Workshop, GREC 2007, 5046, pp. 245–256 (2008). https://doi.org/10.1007/978-3-540-88188-9
Mollah, A.F., Basu, S., Nasipuri, M., Basu, D.K.: Text/Graphics Separation for Business Card Devices, pp. 263–270 (2009)
Google Scholar
Rege, P.P., Chandrakar, C.A.: Text-Image Separation in Document Images Using Boundary/Perimeter Detection (2011)
Google Scholar
Strouthopoulos, C., Papamarkos, N., Atsalakis, A.E.: Text Extraction in Complex Color Documents, vol. 35, pp. 1743–1758 (2002)
Google Scholar
Garg, R., Hassan, E., Chaudhury, S., Gopal, M.: A CRF Based Scheme for Overlapping Multi-Colored Text Graphics Separation,” In: 2011 International Conference on Document Analysis and Recognition, no. c (2011)
Google Scholar
Cao, R., Tan, C.L.: Separation of Overlapping Text from Graphics, pp. 44–48 (2001)
Google Scholar
Science, C., Kent, L., Rd, R., Abe, N.: A Clustering-Based Approach to the Separation of Text Strings from Mixed Text Graphics Documents, pp. 706–710 (1996)
Google Scholar
Vieux, R., Domenger, J., Talence, F.: Hierarchical Clustering Model for Pixel-Based Classification of Document Images, no. Icpr, pp. 290–293 (2012)
Google Scholar
Chinnasarn, K.: Removing Salt-and-Pepper Noise in Text/Graphics Images, IEEE, pp. 459–462
Google Scholar
Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image Analysis Using Mathemetical Morphology, IEEE Trans. Pattern Anal. Mach. Intel. (4), pp. 532–550 (1987)
Google Scholar
Kowalczyk, M., Koza, P., Kupidura, P., Marciniak, J.: Application of Mathematical Morphology Operations for Simplification and Improvement of Correlation of Images in Close-Range Photogrammetry, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVII, part B5. Beijing (2008)
Google Scholar
Verma, R., Ali, J.: A Comparative Study of Various Types of Image Noise and Efficient Noise Removal Techniques, Int. J. Adv. Res. Comput. Sci. Soft. Eng. 3(10), 617–622 (2013)
Google Scholar
Kumar, M., Saxena, R.: Algorithm and Technique on Various Edge Detection: A Survey, vol. 4, no. 3, pp. 65–75 (2013)
Google Scholar
To, E.: The, A DWT, DCT and SVD Based Watermarking, vol. 4, no. 2, pp. 21–32 (2013)
Google Scholar
Jiansheng, M., Sukang, L., Xiaomei, T.: A Digital Watermarking Algorithm Based on DCT and DWT, In: Proceedings of the 2009 International Symposium on Web Information Systems and Applications (WISA’09) Nanchang, P. R. China, May 22–24, vol. 8, no. 2, pp. 104–107 (2009)
Google Scholar

Download references

Acknowledgements

The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The current work reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India.

Author information

Authors and Affiliations

CMATER Laboratory, Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
Shantanu Jana, Nibaran Das, Ram Sarkar & Mita Nasipuri

Authors

Shantanu Jana
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Mita Nasipuri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shantanu Jana .

Editor information

Editors and Affiliations

Department of Mathematics, National Institute of Technology, Durgapur, Durgapur, West Bengal, India
Samarjit Kar
Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
Ujjwal Maulik
School of Economics and Management, Beijing University of Chemical Technology, Beijing, China
Xiang Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jana, S., Das, N., Sarkar, R., Nasipuri, M. (2018). Recognition System to Separate Text Graphics from Indian Newspaper. In: Kar, S., Maulik, U., Li, X. (eds) Operations Research and Optimization. FOTA 2016. Springer Proceedings in Mathematics & Statistics, vol 225. Springer, Singapore. https://doi.org/10.1007/978-981-10-7814-9_14

Download citation

DOI: https://doi.org/10.1007/978-981-10-7814-9_14
Published: 07 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7813-2
Online ISBN: 978-981-10-7814-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics