Skip to main content

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

  • Conference paper
Book cover Computer Analysis of Images and Patterns (CAIP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5702))

Included in the following conference series:

Abstract

Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ABBYY FineReader professional v6.0

    Google Scholar 

  2. Adamek, T., O’Connor, N.E., Smeaton, A.F.: Word matching using single closed contours for indexing handwritten historical documents, IJDAR (2007)

    Google Scholar 

  3. Ambauen, R., Fischer, S., Bunke, H.: Graph Edit Distance with Node Splitting and Merging and its Application to Diatom Identification. In: Hancock, E.R., Vento, M. (eds.) GbRPR 2003. LNCS, vol. 2726, pp. 259–264. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Antonacopoulos, A., Karatzas, D., Krawczyk, H., Wiszniewski, B.: The Lifecycle of a Digital Historical Document: Structure and Content. In: ACM Symposium on DE (2004)

    Google Scholar 

  5. Baird, H.S.: Difficult and urgent open problems in document image analysis for libraries. In: 1st International workshop on Document Image Analysis for Libraries (2004)

    Google Scholar 

  6. Digital Library of Bibliotheque Interuniversitaire de Medecine, Paris, http://www.bium.univparis5.fr/histmed/medica.htm

  7. Kaygin, S., Bulut, M.M.: Shape recognition using attributed string matching with polygon vertices as the primitives. Pattern Recognition Letters (2002)

    Google Scholar 

  8. Keogh, E., Pazzani, M.: Derivative Dynamic Time Warping. In: First SIAM International Conference on Data Mining, Chicago, IL (2001)

    Google Scholar 

  9. Khurshid, K., Faure, C., Vincent, N.: Feature based word spotting in ancient printed documents. In: Proceedings of PRIS (2008)

    Google Scholar 

  10. Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of Niblack inspired binarization techniques for ancient document images. In: 16th International conference DDR (2009)

    Google Scholar 

  11. Manolis, C., Brey, G.: Edit Distance with Single-Symbol Combinations and Splits. In: Proceedings of the Prague Stringology Conference (2008)

    Google Scholar 

  12. Rath, T.M., Manmatha, R.: Word Spotting for historical documents. IJDAR (2007)

    Google Scholar 

  13. Waard, W.P.: An optimised minimal edit distance for hand-written word recognition. Pattern Recognition Letters (1995)

    Google Scholar 

  14. Wong, K.Y., Casey, R.G., Wahl, F.M.: Document analysis system. IBM J. Res. Development (1982)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Khurshid, K., Faure, C., Vincent, N. (2009). A Novel Approach for Word Spotting Using Merge-Split Edit Distance. In: Jiang, X., Petkov, N. (eds) Computer Analysis of Images and Patterns. CAIP 2009. Lecture Notes in Computer Science, vol 5702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03767-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03767-2_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03766-5

  • Online ISBN: 978-3-642-03767-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics