Offline Extraction of Indic Regional Language from Natural Scene Image Using Text Segmentation and Deep Convolutional Sequence

Nag, Sauradip; Ganguly, Pallab Kumar; Roy, Sumit; Jha, Sourab; Bose, Krishna; Jha, Abhishek; Dasgupta, Kousik

doi:10.1007/978-981-13-2345-4_5

Sauradip Nag⁵,
Pallab Kumar Ganguly⁵,
Sumit Roy⁵,
Sourab Jha⁵,
Krishna Bose⁵,
Abhishek Jha⁵ &
…
Kousik Dasgupta⁶

400 Accesses
5 Citations

Abstract

Regional language extraction from a natural scene image is always a challenging proposition due to its dependence on the text information extracted from Image. Text Extraction on the other hand varies on different lighting condition, arbitrary orientation, inadequate text information, heavy background influence over text and change of text appearance. This paper presents a novel unified method for tackling the above challenges. The proposed work uses an image correction and segmentation technique on the existing Text Detection Pipeline an Efficient and Accurate Scene Text Detector (EAST). EAST uses standard PVAnet architecture to select features and non-maximal suppression to detect text from image. Text recognition is done using the combined architecture of MaxOut Convolution Neural Network (CNN) and Bidirectional Long Short Term Memory (LSTM) network. After recognizing text using the Deep Learning based approach, the native languages are translated to English and tokenized using standard Text Tokenizers. The tokens that very likely represent a location are used to find the Global Positioning System (GPS) coordinates of the location and subsequently, the regional languages spoken in that location is extracted. The proposed method is tested on a self-generated dataset collected from Government of India dataset and experimented on Standard Dataset to evaluate the performance of the proposed technique. A comparative study with a few state-of-the-art methods on text detection, recognition, and extraction of regional language from images shows that the proposed method outperforms the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

H. Raj, R. Ghosh, Devanagari text extraction from natural scene images. Int. Conf. Adv. Comput. Informat. 513–517 (2014)
Google Scholar
https://nrsc.gov.in/hackathon2018/
Z. Tian, W. Huang, T. He, P. He,Y. Qiao, Detecting text in natural image with connectionist text proposal network, in Proceedings of ECCV (2016), pp. 56–72
Google Scholar
I.B. Ami, T. Basha, S. Avidan, Racing bib number recognition, In Proc. BMVC (2012)
Google Scholar
P. Shivakumara, R. Raghavendra, L. Qin, K.B. Raja, T. Lu, U. Pal, A new multi-modal approach to bib/text detection and recognition in Marathon images. Pattern Recogn. Voil. 61, 479–491 (2017)
Article Google Scholar
H. Lee, C. Kim, Blurred image region detection and segmentation, in Proceedings of ICIP (2014), pp. 4427–4431
Google Scholar
Y. Wu, P. Shivakumara, T. Lu, C.L. Tan, M. Blumenstein, G.H. Kumar, Contour restoration of text components for recognition in video/scene images, IEEE Trans. IP 5622–5634 (2016)
Article MathSciNet Google Scholar
H. Zhang, K. Zhao, Y.Z. Song, J. Guo, Text extraction from natural scene image: a survey. Neurocomputing 122, 310–323 (2013)
Article Google Scholar
B. Shi, X. Bai, S. Belongie, Detecting oriented text in natural images by linking segments, in Proceedings of CVPR (2017), pp. 3482–3490
Google Scholar
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, EAST: an efficient and accurate scene text detector, in Proceedings of CVPR (2017), pp. 2645–2651
Google Scholar
P. He, W. Huang, Y. Qiao, C.C. Loy, X. Tang, Reading Scene Text in Deep Convolutional Sequences (2015)
Google Scholar
https://pypi.org/project/goslate/
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl
https://pypi.org/project/reverse_geocode
Regional Language List (https://en.wikipedia.org/wiki/Regional_language)
A. Buades, B. Coll, J.-M. Morel, Non-local means denoising. Image Process. Line 1, 208–212 (2011)
MATH Google Scholar
J. Chen, J. Benesty, Y.A. Huang, S. Doclo, New insights into the noise reduction wiener filter. IEEE Trans. Audio. Speech. Lang. Process. 14(4), 1218–1234 (2006)
Article Google Scholar
http://www.iapr-tc11.org/mediawiki/index.php?title=KAIST_Scene_Text_Database
http://www.iapr-tc11.org/mediawiki/index.php?title=NEOCR:_Natural_Environment_OCR_Dataset
N. Sharma, R. Mandal, R. Sharma, U. Pal, M. Blumenstein, ICDAR2015 competition on video script identification (CVSI 2015), in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), vol. 2015, no. Cvsi (2015), pp. 1196–1200
Google Scholar
https://github.com/libindic/indic-trans
https://data.gov.in/catalog/all-india-pincode-directory
https://gist.github.com/gsivaprabu/5336570
A. Kumar, An efficient approach for text extraction in images and video frames using gabor filter. Int. J. Comput. Electr. Eng. 6(4), 316–320 (2014)
Article Google Scholar
https://github.com/tesseract-ocr/tesserac
https://github.com/tmbdev/ocropy
X. Huang, T. Shen, R. Wang, C. Gao, Text detection and recognition in natural scene images, in 2015 International Conference on Estimation, Detection and Information Fusion (ICEDIF) (2015), pp. 44–49
Google Scholar
U. Roy, A. Mishra, K. Alahari, C.V. Jawahar, Scene text recognition and retrieval for large lexicons, in Accv2014 (2014), pp. 7–10
Google Scholar
A. Gordo, A. Forn, E. Valveny, J. Almaz, Word Spotting and Recognition with Embedded Attributes, vol. 36, no. 12, pp. 2552–2566 (2014)
Google Scholar
H. Zhao, Y. Hu, J. Zhang, Character Recognition via a Compact Convolutional Neural Network (2017)
Google Scholar
X. Yin, X. Yin, K. Huang, H. Hao, Robust Text Detection in Natural Scene Images, vol. 36, no. 5, pp. 970–983 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Kalyani Government Engineering College, Kalyani, Nadia, 741235, India
Sauradip Nag, Pallab Kumar Ganguly, Sumit Roy, Sourab Jha, Krishna Bose & Abhishek Jha
Faculty of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, Nadia, 741235, India
Kousik Dasgupta

Authors

Sauradip Nag
View author publications
You can also search for this author in PubMed Google Scholar
Pallab Kumar Ganguly
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Roy
View author publications
You can also search for this author in PubMed Google Scholar
Sourab Jha
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Bose
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Jha
View author publications
You can also search for this author in PubMed Google Scholar
Kousik Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sauradip Nag , Pallab Kumar Ganguly , Sumit Roy , Sourab Jha , Krishna Bose , Abhishek Jha or Kousik Dasgupta .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Department of Computer and System Sciences, Visva-Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, West Bengal, India
Kousik Dasgupta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nag, S. et al. (2018). Offline Extraction of Indic Regional Language from Natural Scene Image Using Text Segmentation and Deep Convolutional Sequence. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Methodologies and Application Issues of Contemporary Computing Framework. Springer, Singapore. https://doi.org/10.1007/978-981-13-2345-4_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-2345-4_5
Published: 22 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2344-7
Online ISBN: 978-981-13-2345-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics