Skip to main content
Log in

Mining near duplicate image groups

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Most recently the social media sharing websites such as Flickr, Facebook, and Picasa have allowed users to share their personal photos with friends. Moreover, people like to follow, forward their favorite images, which is one of the main source of near duplicate images. And also, the worldwide place of interests such as Roma, Statue of Liberty and London Tower Bridge etc., attract world-wide visitors. For these places, travelers take photos, write travelogues and share them with their social friends. The photos taken from the same place with or without viewpoint variations are near duplicate images. How to detect them is an ad-hoc problem in the area of image analysis and multimedia processing. The existing near duplicate image processing approaches mainly focused on finding the near duplicate images for a given input image, where a query image is needed. However, how to find the near duplicate image groups (NDIG) automatically from the web-scale social images is very challenging. So, in this paper, instead of searching near duplicates image for certain input image, we proposed an automatic NDIG mining approach by utilizing adaptive global feature clustering and local feature refinement. The proposed NDIG mining approach is achieved by utilizing a hierarchical model. It is a two-layer hierarchical structure by first utilizing adaptive global feature clustering based candidate NDIG detection and then using local feature refinement based NDIG verification. The global clustering is mainly for reducing computational cost for processing the large scale image set. The local refinement is for improving NDIG detection performances. Experiments on four datasets show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Battiato S, Farinella GM, Guarnera GC (2010) Bags of phrases with codebooks alignment for near duplicate image detection, MiFOR’10, October 29, 2010, Firenze, Italy

  2. Chum J, Isard O, Sivic M, Zisserman JA (2007) Object retrieval with large vocabularies and fast spatial matching. Comput Vis Pattern Recognit, CVPR ‘07

  3. Gao Y, Wang M, Tian Q (2011) Less is more: efficient 3-D object retrieval with query view selection. IEEE Trans Multimed 13(5)

  4. Gao Y, Wang M, Shen J (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1)

  5. Han Y, Xu Z, Ma Z, Huang Z (2013) Image classification with manifold learning forout-of-sample data. Sig Process 93(8), August

  6. Han Y, Yang Y, Ma Z, Shen H, Sebe N, Zhou X (2014) Image attribute adaptation. IEEE Trans Multimed

  7. Hu Y, Cheng X, Chia LT, Xie X (2009) Coherent phrase model for efficient image near-duplicate retrieval. IEEE Trans Multimed 11(8), DECEMBER, 2009

  8. Kennedy L, Naaman M (2008) Generating diverse and representative image search results for landmarks. WWW

  9. Lee J, Tong W, Jin R, Jain AK Image retrieval in forensics: application to tattoo image database. IEEE Multimed

  10. Li X, Snoek CGM, Worring M, Smeulders AWM (2011) Social negative bootstrapping for visual categorization. ICMR’11, April 17–20, Trento, Italy

  11. Li J, Qian X, Tang Y, Yang L (2013) GPS estimation for users’ photos. MMM

  12. Li J, Qian X, Tang Y, Yang L, Mei T (2014) GPS estimation for places of interest from social users’ uploaded photos. IEEE Trans Multimed

  13. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  14. Philbin J, Sivic J, Zisserman A (2010) Geometric latent Dirichlet allocation on amatching graph for large-scale image datasets. Int J Comput Vis

  15. Qian X, Guo D, Hou X, Li Z, Wang H, Liu G (2012) HWVP: Hierarchical wavelet packet descriptors and their applications in scene categorization and semantic concept retrieval. Multimed Tools Appl pp 1–24

  16. Sayad IE, Martinet J, Urruty T, Benabbas Y, Djeraba C (2011) A semantically significant visual representation for social image retrieval vol. 978-1-61284-350-6/11/$26.00 ©2011 IEEE

  17. Song J, Yi Y, Huang Z, Shen H, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. MM’11, November 28–December 1, 2011, Scottsdale, Arizona, USA

  18. Stricker MA, Orengo M (1995) Similarity of color images. Proc IS&T/SPIE\’s Symp Electron Imaging Sci Technol pp 381–392

  19. Wang X, Zhang L, Ma W Duplicate-search-based image annotation using web-scaledata. doi:10.1109/JPROC.2012.2193109

  20. Wang B, Li Z, Li M, Ma W (2006) Large-scale duplicate detection for web image search. ICME

  21. Wang M, Yang K, Hua X (2010) Towards a relevant and diverse search of social images. IEEE Trans Multimed 12(8)

  22. Wang Y, Hou Z, Leman K, Pham NT, Chua TW (2011) Combination of local and global features for near-duplicate detection. In: Lee K-T et al. (eds), MMM 2011, Part I, LNCS 6523, pp 328–338

  23. Wu P, Hoi SCH, Zhao P, He Y (2011) Mining social images with distance metric learning for automated image tagging. WSDM’11, February 9–12, 2011, Hong Kong, China

  24. Xu D, Cham TJ, Yan S, Chang S Near duplicate image identification with spatially aligned pyramid matching

  25. Xu D, Cham TJ, Yan S, Duan L, Chang S (2010) Near duplicate identification with spatially aligned pyramid matching. IEEE Trans Circ Syst Video Technol

  26. Zhou W, Lu Y, Li H, Song Y, Tian Q (2010) Spatial coding for large scale partial-duplicate web image search. MM’10, October 25–29, 2010, Firenze, Italy

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xueming Qian or Liejun Wang.

Additional information

This work is supported in part by National Natural Science Foundation of China (NSFC) Project No.61173109 and No.60903121, No.61261036, and Microsoft Research Asia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Qian, X., Li, Q. et al. Mining near duplicate image groups. Multimed Tools Appl 74, 655–669 (2015). https://doi.org/10.1007/s11042-014-2008-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2008-0

Keywords

Navigation