Discrete Point Based Signatures and Applications to Document Matching

  • Nemanja Spasojevic
  • Guillaume Poncin
  • Dan Bloomberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6978)


Document analysis often starts with robust signatures, for instance for document lookup from low-quality photographs, or similarity analysis between scanned books. Signatures based on OCR typically work well, but require good quality OCR, which is not always available and can be very costly. In this paper we describe a novel scheme for extracting discrete signatures from document images. It operates on points that describe the position of words, typically the centroid. Each point is extracted using one of several techniques and assigned a signature based on its relation to the nearest neighbors. We will discuss the benefits of this approach, and demonstrate its application to multiple problems including fast image similarity calculation and document lookup.


image processing feature extraction image lookup 


  1. 1.
    Bloomberg, D., Vincent, L.: Document Image Analysis, Mathematical morphology: theory and applications, Najman L., Talbot H. (ed.), pp. 425–438 (2010)Google Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  4. 4.
    Ke, Y., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: Proc. CVPR 2004, pp. 506–513 (2004)Google Scholar
  5. 5.
    Liu, Q., Yano, H., Kimber, D., Liao, C., Wilcox, L.: High accuracy and language independent document retrieval with a fast inv. t. In: Proc. ICME 2009, pp. 386–389 (2009)Google Scholar
  6. 6.
    Nakai, T., Kise, K., Iwamura, M.: Hashing with Local Combinations of Feature Points and Its App. In: Proc. CBDAR 2005, pp. 87–94 (2005)Google Scholar
  7. 7.
    Shijian, L., Linlin, L., Chew Lim, T.: Document Image Retrieval through Word Shape Coding. IEEE TPAMI 30(11), 1913–1918 (2008)CrossRefGoogle Scholar
  8. 8.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nemanja Spasojevic
    • 1
  • Guillaume Poncin
    • 1
  • Dan Bloomberg
    • 1
  1. 1.Google Inc.Mountain ViewUSA

Personalised recommendations