Skip to main content

Word Retrieval from Kannada Document Images Using HOG and Morphological Features

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 709))

Abstract

This paper presents a method to retrieve words from Kannada documents. It works on Histogram of Oriented Gradients (HOG) and Morphological filters. A large dataset of 50000 words is created using 250 document pages belongs to different categories. A preprocessed document image is segmented using simple morphological filters. The histogram channels are designed over four-sided cells (i.e. R-HOG) to compute gradients of a word image. In parallel, morphological erosion, opening, top and bottom hat transformations are applied on each word. The densities of the resultant images are estimated. Later on, HOG and morphological features are fused. Then, the cosine distance is used to measure the similarity between two words i.e., query and candidate word, based on it, the relevance of the word is estimated by generating distance ranks. Then correctly matched words are selected at threshold 98%. The experimental results confirm the efficiency of our proposed method in terms of the average precision rate 91.23%, and average recall rate 84.78% as well as average F-measure 89.47%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Otsu, N.: A threshold selection method from gray-level histograms. Pattern Anal. Mach. Intell. 9(1), 62–66 (1979)

    MathSciNet  Google Scholar 

  2. Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts, document analysis and recognition. Int. J. Doc. Anal. Recogn. 1, 218–222 (2003)

    Google Scholar 

  3. Konidaris, T., Gatos, B., Ntzios, K.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. Int. J. Doc. Anal. Recogn. 9, 167–177 (2007)

    Article  Google Scholar 

  4. Lu, S., Li, L., Tan, C.L.: Document image retrieval through word shape coding. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1913–1918 (2008)

    Article  Google Scholar 

  5. Bai, S., Li, L., Tan, C.L.: Keyword spotting in document images through word shape coding. In: Document Analysis and Recognition, pp. 331–335 (2009)

    Google Scholar 

  6. Hangarge, M., Dhandra, B.V.: Script identification in indian document images based on directional morphological filters. Int. J. Recent Trends Eng. 2, 124–126 (2009)

    Google Scholar 

  7. Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: International Conference on Document Analysis and Recognition, pp. 1080–1084 (2011)

    Google Scholar 

  8. Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries-a word spotting based retrieval approach. In: International Conference on Document Analysis and Recognition, pp. 1344–1348 (2011)

    Google Scholar 

  9. Yat, M., Lam, L., Suen, C.Y.: Arabic handwritten word spotting using language models, pp. 43–48 (2012)

    Google Scholar 

  10. Doermann, D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Underst. 70(3), 287–298 (1998)

    Article  Google Scholar 

  11. Lu, S., Chen, B.M., Ko, C.C.: A partition approach for the restoration of camera images of planar and curled document. In: Image and Vision Computing, pp. 837–848 (2006)

    Google Scholar 

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  13. Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 211–224 (2012)

    Article  Google Scholar 

  14. www.ee.iisc.ernet.in/new/people/student/phd/pati (2005)

  15. Tarasawa, K., Tanaka, Y.: Slit style HOG feature for document image word spotting. In: ICDAR (2009)

    Google Scholar 

  16. Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recogn. Lett. 29, 1218–1229 (2008)

    Article  Google Scholar 

  17. Jain, R., Frinken, V., Jawahar, C.V., Manmatha, R.: BLSTM neural network based word retrieval for Hindi documents. In: 2011 International Conference on Document Analysis and Recognition, pp. 83–87 (2011)

    Google Scholar 

  18. Tarafdar, A., Mondal, R., Pal, S., Pal, U., Kimura, F.: Shape code based word-image matching for retrieval of Indian multi-lingual documents. In: International Conference on Pattern Recognition (2010)

    Google Scholar 

  19. Hangarage, M., Veershetty, C., Rajmohan, P., Dhandra, B.V.: Gabor wavelets based word retrieval from Kannada documents. Procedia Comput. Sci. 79, 441–448 (2016). International Conference on Communication, Computing and Visualization

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Veershetty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Hangarge, M., Veershetty, C., Rajmohan, P., Mukarambi, G. (2017). Word Retrieval from Kannada Document Images Using HOG and Morphological Features. In: Santosh, K., Hangarge, M., Bevilacqua, V., Negi, A. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2016. Communications in Computer and Information Science, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-4859-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4859-3_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4858-6

  • Online ISBN: 978-981-10-4859-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics