Partition Min-Hash for Partial Duplicate Image Discovery

  • David C. Lee
  • Qifa Ke
  • Michael Isard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


In this paper, we propose Partition min-Hash (PmH), a novel hashing scheme for discovering partial duplicate images from a large database. Unlike the standard min-Hash algorithm that assumes a bag of words image representation, our approach utilizes the fact that duplicate regions among images are often localized. By theoretical analysis, simulation, and empirical study, we show that PmH outperforms standard min-Hash in terms of precision and recall, while being orders of magnitude faster. When combined with the start-of-the-art Geometric min-Hash algorithm, our approach speeds up hashing by 10 times without losing precision or recall. When given a fixed time budget, our method achieves much higher recall than the state-of-the-art.


Hash Function Image Retrieval Visual Word Hash Table Query Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences SEQUENCES’97 (1997)Google Scholar
  2. 2.
    Chum, O., Philbin, J., Isard, M., Zisserman, A.: Scalable near identical image and shot detection. In: Proc. of the Int. Conf. on Image and Video Retrieval (2007)Google Scholar
  3. 3.
    Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of the British Machine Vision Conference (2008)Google Scholar
  4. 4.
    Chum, O., Perdoch, M., Matas, J.: Geometric min-hashing: Finding a (thick) needle in a haystack. In: CVPR (2009)Google Scholar
  5. 5.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  6. 6.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  7. 7.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Wu, Z., Ke, Q., Isard, M., Sun, J.: Bundling features for large scale partial-duplicateweb image search. In: CVPR (2009)Google Scholar
  9. 9.
    Grauman, K., Darrell, T.: Pyramid match hashing: Sub-linear time indexing over partial correspondences. In: CVPR (2007)Google Scholar
  10. 10.
    Jain, P., Kulis, B., Grauman, K.: Fast image search for learned metrics. In: CVPR (2008)Google Scholar
  11. 11.
    Ke, Y., Sukthankar, R., Huston, L.: Efficient near-duplicate detection and sub-image retrieval. In: Proc. of ACM Int. Conf. on Multimedia (2004)Google Scholar
  12. 12.
    Torralba, A., Fergus, R., Weiss, Y.: Small codes and large databases for recognition. In: IEEE CVPR (2008)Google Scholar
  13. 13.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 20, 91–110 (2003)Google Scholar
  14. 14.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. PAMI 27(10), 1615–1630 (2005)Google Scholar
  15. 15.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. of ACM symposium on Theory of computing (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • David C. Lee
    • 1
  • Qifa Ke
    • 2
  • Michael Isard
    • 2
  1. 1.Carnegie Mellon University 
  2. 2.Microsoft Research Silicon Valley 

Personalised recommendations