Multimedia Tools and Applications

, Volume 78, Issue 6, pp 7767–7801 | Cite as

Word searching in scene image and video frame in multi-script scenario using dynamic shape coding

  • Partha Pratim RoyEmail author
  • Ayan Kumar Bhunia
  • Avirup Bhattacharyya
  • Umapada Pal


Retrieval of text information from natural scene images and video frames is a challenging task due to its inherent problems like complex character shapes, low resolution, background noise, etc. Available OCR systems often fail to retrieve such information in scene/video frames. Keyword spotting, an alternative way to retrieve information, performs efficient text searching in such scenarios. However, current word spotting techniques in scene/video images are script-specific and they are mainly developed for Latin script. This paper presents a novel word spotting framework using dynamic shape coding for text retrieval in natural scene image and video frames. The framework is designed to search query keyword from multiple scripts with the help of on-the-fly script-wise keyword generation for the corresponding script. We have used a two-stage word spotting approach using Hidden Markov Model (HMM) to detect the translated keyword in a given text line by identifying the script of the line. A novel unsupervised dynamic shape coding based scheme has been used to group similar shape characters to avoid confusion and to improve text alignment. Next, the hypotheses locations are verified to improve retrieval performance. To evaluate the proposed system for searching keyword from natural scene image and video frames, we have considered two popular Indic scripts such as Bangla (Bengali) and Devanagari along with English. Inspired by the zone-wise recognition approach in Indic scripts [37], zone-wise text information has been used to improve the traditional word spotting performance in Indic scripts. For our experiment, a dataset consisting of images of different scenes and video frames of English, Bangla and Devanagari scripts were considered. The results obtained showed the effectiveness of our proposed word spotting approach.


Scene and video text retrieval Indic word spotting Hidden Markov model Dynamic shape code Word spotting in multiple scripts 



  1. 1.
    Banerjee P, Chaudhuri BB (2013) An approach for Bangla and Devanagari video text recognition, in Proceedings of the 4th International Workshop on Multilingual OCR, p. 8Google Scholar
  2. 2.
    Bhunia AK, Das A, Roy PP, and Pal U (2015) A comparative study of features for handwritten Bangla text recognition, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 636–640Google Scholar
  3. 3.
    Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using Color Channel selection, Multimed Tools Appl, 77(7):8551–8578CrossRefGoogle Scholar
  4. 4.
    Bhunia AK, Roy PP, Mohata A, Pal U (2018) Cross-language framework for word recognition and spotting of Indic scripts. Pattern Recogn 79:12–31CrossRefGoogle Scholar
  5. 5.
    Bianne-Bernard AL, Menasri F, Al-Hajj Mohamad R, Mokbel C, Kermorvant C, Likforman-Sulem L (2011) Dynamic and contextual information in HMM modeling for handwritten word recognition. IEEE Trans Pattern Anal Mach Intell 33(10):2066–2080CrossRefGoogle Scholar
  6. 6.
    Cao H, Prasad R, Natarajan P (2011) Handwritten and typewritten text identification and recognition using hidden Markov models, in Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 744–748Google Scholar
  7. 7.
    Chaudhuri BB, Pal U (1998) A complete printed Bangla OCR system. Pattern Recogn 31(5):531–549CrossRefGoogle Scholar
  8. 8.
    Chen D, Odobez J-M (2005) Video text recognition using sequential Monte Carlo and error voting methods. Pattern Recogn Lett 26(9):1386–1403CrossRefGoogle Scholar
  9. 9.
    Chen X, Yuille AL (2004) Detecting and reading text in natural scenes, in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp. II--IIGoogle Scholar
  10. 10.
    Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, and Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions, in Proceedings - International Conference on Image Processing, ICIP, pp. 2609–2612Google Scholar
  11. 11.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893Google Scholar
  12. 12.
    Fischer A, Keller A, Frinken V, Bunke H (2012) Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn Lett 33(7):934–942CrossRefGoogle Scholar
  13. 13.
    Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224CrossRefGoogle Scholar
  14. 14.
    Gatos B et al. (2015) GRPOLY-DB: An old Greek polytonic document image database, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 646–650Google Scholar
  15. 15.
    Giotis AP, Sfikas G, Gatos B, Nikou C (2017) A survey of document image word spotting techniques. Pattern Recogn 68:310–332CrossRefGoogle Scholar
  16. 16.
    Guo JK, Ma MY (2001) Separating handwritten material from machine printed text using hidden markov models, in Document Analysis and Recognition, Proceedings. Sixth International Conference on, 2001, pp. 439–443Google Scholar
  17. 17.
    He P, Huang W, Qiao Y, Loy CC, and Tang X (2016) Reading Scene Text in Deep Convolutional Sequences., in AAAI, pp. 3501–3508Google Scholar
  18. 18.
    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors, in Proceedings of the IEEE International Conference on Computer Vision, pp. 1241–1248Google Scholar
  19. 19.
    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting, in European conference on computer vision, pp. 512–528Google Scholar
  20. 20.
    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRefGoogle Scholar
  21. 21.
    Khotanzad A, Hong YH (1990) Invariant image recognition by Zernike moments. IEEE Trans Pattern Anal Mach Intell 12(5):489–497CrossRefGoogle Scholar
  22. 22.
    Krishnan P, Dutta K, and Jawahar CV (2016) Deep feature embedding for accurate recognition and retrieval of handwritten text, In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 289–294Google Scholar
  23. 23.
    Kumar G, Govindaraju V (2017) Bayesian background models for keyword spotting in handwritten documents. Pattern Recogn 64:84–91CrossRefGoogle Scholar
  24. 24.
    Li K, He FZ, Yu HP, Chen X (2017) A correlative classifiers approach based on particle filter and sample set for tracking occluded target. Applied Mathematics-A Journal of Chinese Universities 32(3):294–312MathSciNetCrossRefGoogle Scholar
  25. 25.
    Li K, He FZ, Yu HP (2018) Robust visual tracking based on convolutional features with illumination and occlusion handing. J Comput Sci Technol 33(1):223–236CrossRefGoogle Scholar
  26. 26.
    Lu S, Li L, Tan CL (2008) Document image retrieval through word shape coding. IEEE Trans Pattern Anal Mach Intell 30(11):1913–1918CrossRefGoogle Scholar
  27. 27.
    Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int J Pattern Recognit Artif Intell 15(1):65–90CrossRefGoogle Scholar
  28. 28.
    Nakayama T (1994) Modeling content identification from document images, in Proceedings of the fourth conference on Applied natural language processing, pp. 22–27Google Scholar
  29. 29.
    Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. Comput Vision--ACCV 2010:770–783Google Scholar
  30. 30.
    Neumann L, Matas J (2012) Real-time scene text localization and recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3538–3545Google Scholar
  31. 31.
    Quy Phan T, Shivakumara P, Tian S, and Lim Tan C (2013) Recognizing text with perspective distortion in natural scenes, in Proceedings of the IEEE International Conference on Computer Vision, pp. 569–576Google Scholar
  32. 32.
    Rath TM, Manmatha R (2006) Word spotting for historical documents. Int J Doc Anal Recognit 9(2–4):139–152Google Scholar
  33. 33.
    Rodriguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden Markov models and universal vocabularies. Pattern Recogn 42(9):2106–2116CrossRefGoogle Scholar
  34. 34.
    Roy S, Shivakumara P, Roy PP, and Tan CL (2012) Wavelet-Gradient-Fusion for Video Text Binarization, Int. Conf. Pattern Recognit., no. Icpr, pp. 3300–3303Google Scholar
  35. 35.
    Roy PP, Rayar F, Ramel J-Y (2015) Word spotting in historical documents using primitive codebook and dynamic programming. Image Vis Comput 44:15–28CrossRefGoogle Scholar
  36. 36.
    Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Syst Appl 42(13):5554–5566CrossRefGoogle Scholar
  37. 37.
    Roy PP, Bhunia AK, Das A, Dey P, Pal U (2016) HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recogn 60:1057–1075CrossRefGoogle Scholar
  38. 38.
    Roy PP, Bhunia AK, Pal U (2017) Date-field retrieval in scene image and video frames using text enhancement and shape coding, Neurocomputing Google Scholar
  39. 39.
    Roy PP, Bhunia AK, Das A, Dhar P, Pal U (2017) Keyword spotting in doctor’s handwriting on medical prescriptions. Expert Syst Appl 76:113–128CrossRefGoogle Scholar
  40. 40.
    Rusiñol M, Aldavert D, Toledo R, Lladós J (2015) Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn 48(2):545–555CrossRefGoogle Scholar
  41. 41.
    Saidane Z, Garcia C (2007) Robust binarization for video text recognition, in Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 2, pp. 874–878Google Scholar
  42. 42.
    Sain A, Bhunia AK, Roy PP, Pal U (2018) Multi-oriented text detection and verification in video frames and scene images. Neurocomputing 275:1531–1549CrossRefGoogle Scholar
  43. 43.
    Sharma N, Shivakumara P, Pal U, Blumenstein M, and Tan CL (2012) A new method for arbitrarily-oriented text detection in video, in Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on, pp. 74–78Google Scholar
  44. 44.
    Shivakumara P, Liang G, Roy S, Pal U, and Lu T (2015) New texture-spatial features for keyword spotting in video images, in Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on, pp. 391–395Google Scholar
  45. 45.
    Srihari SN, Srinivasan H, Huang C, Shetty S (2006) Spotting words in Latin, Devanagari and Arabic scripts. Vivek-Bombay 16(3):2Google Scholar
  46. 46.
    Sudholt S, Fink GA (2016) PHOCNet: A deep convolutional neural network for word spotting in handwritten documents, In International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282Google Scholar
  47. 47.
    Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920CrossRefGoogle Scholar
  48. 48.
    Sun J, He FZ, Chen YL, Chen X (2016) A multiple template approach for robust tracking of fast motion target. Applied Mathematics-A Journal of Chinese Universities 31(2):177–197MathSciNetCrossRefGoogle Scholar
  49. 49.
    Tarafdar A, Mondal R, Pal S, Pal U, and Kimura F (2010) Shape code based word-image matching for retrieval of Indian multi-lingual documents, in Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 1989–1992Google Scholar
  50. 50.
    Thomas S, Chatelain CC, Heutte L, Paquet T, Kessentini Y (2015) A deep HMM model for multiple keywords spotting in handwritten documents. Pattern Anal Appl 18(4):1003–1015MathSciNetCrossRefGoogle Scholar
  51. 51.
    Toselli AH, Vidal E, Romero V, Frinken V (2016) HMM word graph based keyword spotting in handwritten document images. Inf Sci (Ny) 370–371:497–518CrossRefGoogle Scholar
  52. 52.
    Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition, in Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 1457–1464Google Scholar
  53. 53.
    Wang T, Wu DJ, Coates A, and Ng AY (2012) End-to-end text recognition with convolutional neural networks, in Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 3304–3308Google Scholar
  54. 54.
    Wang R, Sang N, Gao C (2015) Text detection approach based on confidence map and context information. Neurocomputing 157:153–165CrossRefGoogle Scholar
  55. 55.
    Wilkinson T, Lindström J, and Brun A (2017) Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In International Conference Computer Vision (ICCV), pp. 4443–4452Google Scholar
  56. 56.
    Wshah S, Kumar G, Govindaraju V (2014) Statistical script independent word spotting in offline handwritten documents,” in. Pattern Recogn 47(3):1039–1050CrossRefGoogle Scholar
  57. 57.
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images, in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 1083–1090Google Scholar
  58. 58.
    Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500CrossRefGoogle Scholar
  59. 59.
    Yin X-C, Yin X, Huang K, Hao H-W (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983CrossRefGoogle Scholar
  60. 60.
    Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937CrossRefGoogle Scholar
  61. 61.
    Young SJ et al. (2009) The HTK Book (for HTK Version 3.4), Construction, no. July 2000, p. 384Google Scholar
  62. 62.
    Yu C, Song Y, Zhang Y (2016) Scene text localization using edge analysis and feature pool. Neurocomputing 175:652–661CrossRefGoogle Scholar
  63. 63.
    Yu H, He F, Pan Y (2018) A novel region-based active contour model via local patch similarity measure for image segmentation. Multimedia Tools and Applications:1–23Google Scholar
  64. 64.
    Zagoris K, Pratikakis I, and Gatos B (2017) Unsupervised Word Spotting in Historical Handwritten Document Images using Document-oriented Local Features, IEEE Trans. Image Process Google Scholar
  65. 65.
    Zhang X, Pal U, and Tan CL (2014) Segmentation-free Keyword spotting for Bangla handwritten documents, in Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, pp. 381–386Google Scholar
  66. 66.
    Zhang Z, Shen W, Yao C, and Bai X (2015) Symmetry-based text line detection in natural scenes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567Google Scholar
  67. 67.
    Zhou Z, Li L, Tan CL (2010) Edge based binarization for video text images, in Proceedings - International Conference on Pattern Recognition, pp. 133–136Google Scholar
  68. 68.
    Zhou Y, He F, Qiu Y (2017) Dynamic strategy based parallel ant colony optimization on GPUs for TSPs. SCIENCE CHINA Inf Sci 60(6):068102CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of CSEIndian Institute of Technology RoorkeeRoorkeeIndia
  2. 2.Department of ECEInstitute of Engineering & ManagementKolkataIndia
  3. 3.CVPR UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations