Aggregation-Based Probing for Large-Scale Duplicate Image Detection

  • Ziming Feng
  • Jia Chen
  • Xian Wu
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7808)


Identifying visually duplicate images is a prerequisite for a broad range of tasks in image retrieval and mining, thus attracts heavy research interests. Many efficient and precise algorithms are proposed. However, compared to the performance duplicate text detection, the recall for duplicate image detection is relatively low, which means that many duplicate images are left undetected. In this paper, we focus on improving recall while preserving high precision. We exploit hash code representation of images and present a probing based algorithm to increase the recall. Different from state-of-the-art probing methods in image search, multiple probing sequences exist in duplicate image detection task. To merge multiple probing sequences, we design an unsupervised score-based aggregation algorithm. The experimental results on a large scale data set show that precision is preserved and the recall is increased. Furthermore, our algorithm on aggregating multiple probing sequences is proved to be stable.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: NIPS, pp. 193–200 (2006)Google Scholar
  2. 2.
    Chen, S., Wang, F., Song, Y., Zhang, C.: Semi-supervised ranking aggregation. In: CIKM, pp. 1427–1428 (2008)Google Scholar
  3. 3.
    Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: BMVC (2008)Google Scholar
  4. 4.
    Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW, pp. 613–622 (2001)Google Scholar
  5. 5.
    Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD Conference, pp. 301–312 (2003)Google Scholar
  6. 6.
    Huang, Z., Shen, H.T., Shao, J., Zhou, X., Cui, B.: Bounded coordinate system indexing for real-time video clip search. ACM Trans. Inf. Syst., 27(3) (2009)Google Scholar
  7. 7.
    Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Ranking NIPS 2009 Workshop, pp. 22–27 (2009)Google Scholar
  8. 8.
    Klementiev, A., Roth, D., Small, K.: An unsupervised learning algorithm for rank aggregation. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 616–623. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Lee, D.C., Ke, Q., Isard, M.: Partition min-hash for partial duplicate image discovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 648–662. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Li, Y., Jin, J., Zhou, X.: Video matching using binary signature. In: Intelligent Signal Processing and Communication Systems, pp. 317–320 (December 2005)Google Scholar
  11. 11.
    Liu, Y., Liu, T.-Y., Qin, T., Ma, Z., Li, H.: Supervised rank aggregation. In: WWW, pp. 481–490 (2007)Google Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In: VLDB, pp. 950–961 (2007)Google Scholar
  14. 14.
    Pönitz, T., Stöttinger, J.: Efficient and robust near-duplicate detection in large and growing image data-sets. In: ACM Multimedia, pp. 1517–1518 (2010)Google Scholar
  15. 15.
    Qamra, A., Meng, Y., Chang, E.Y.: Enhanced perceptual distance functions and indexing for image replica recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 379–391 (2005)CrossRefGoogle Scholar
  16. 16.
    Valle, E., Cord, M., Philipp-Foliguet, S.: High-dimensional descriptor indexing for large multimedia databases. In: CIKM, pp. 739–748 (2008)Google Scholar
  17. 17.
    Wang, B., Li, Z., Li, M., Ma, W.-Y.: Large-scale duplicate detection for web image search. In: ICME, pp. 353–356 (2006)Google Scholar
  18. 18.
    Wang, X.-J., Zhang, L., Liu, M., Li, Y., Ma, W.-Y.: Arista - image search to annotation on billions of web photos. In: CVPR, pp. 2987–2994 (2010)Google Scholar
  19. 19.
    Wang, Y., Hou, Z., Leman, K.: Keypoint-based near-duplicate images detection using affine invariant feature and color matching. In: ICASSP, pp. 1209–1212 (2011)Google Scholar
  20. 20.
    Zhang, D., Chang, S.-F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. In: ACM Multimedia, pp. 877–884 (2004)Google Scholar
  21. 21.
    Zhao, X., Li, G., Wang, M., Yuan, J., Zha, Z.-J., Li, Z., Chua, T.-S.: Integrating rich information for video recommendation with multi-task rank aggregation. In: ACM Multimedia, pp. 1521–1524 (2011)Google Scholar
  22. 22.
    Zhou, W., Lu, Y., Li, H., Song, Y., Tian, Q.: Spatial coding for large scale partial-duplicate web image search. In: ACM Multimedia, pp. 511–520 (2010)Google Scholar
  23. 23.
    Zhu, J., Hoi, S.C.H., Lyu, M.R., Yan, S.: Near-duplicate keyframe retrieval by nonrigid image matching. In: ACM Multimedia, pp. 41–50 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ziming Feng
    • 1
  • Jia Chen
    • 1
  • Xian Wu
    • 1
  • Yong Yu
    • 1
  1. 1.Computer Science DepartmentShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations