Skip to main content
Log in

Integrating bilingual search results for automatic junk image filtering

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Keyword-based image search engines are now very popular for accessing large amounts of Web images on the Internet. Most existing keyword-based image search engines may return large amounts of junk images (which are irrelevant to the given query word), because the text terms that are loosely associated with the Web images are also used for image indexing. The objective of the proposed work is to effectively filter out the junk images from image search results. Therefore, bilingual image search results for the same keyword-based query are integrated to identify the clusters of the junk images and the clusters of the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. Experiments for a large number of bilingual keyword-based queries (5,000 query words) are simultaneously performed on two keyword-based image search engines (Google Images in English and Baidu Images in Chinese), and our experimental results have shown that integrating bilingual image search results can filter out the junk images effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Barnard K, Duygulu P, Forsyth DA (2001) Clustering art. IEEE CVPR 434–441

  2. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) SURF: Speeded up robust features. Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  3. Cai D, He X, Li Z, Ma W-Y, Wen J-R (2004) Hierarchical clustering of WWW image search results using visual, textual, and link information. ACM Multimedia

  4. Chen Y, Wang JZ, Krovetz R (2005) Clue: cluster-based retrieval of images by unsupervised learning. IEEE Trans IP 14(8):1187–1201

    Google Scholar 

  5. Ding C, He X, Zha H, Gu M, Simon H (2001) A min-max cut algorithm for graph partitioning and data clustering. In: ICDM

  6. Fan J, Yang C, Shen Y, Babaguchi N, Luo H (2009) Leveraging large-scale weakly-tagged images to train inter-related classifiers for multi-label annotation. In: ACM Multimedia workshop on large-scale image retrieval

  7. Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. In: IEEE CVPR

  8. Fergus R, Perona P, Zisserman A (2004) A visual category filter for Google images. In: Proc ECCV

  9. Fergus R, Fei-Fei L, Perona P, Zisserman A (2006) Learning object categories from Google’s image search. In: Proc IEEE CVPR

  10. Gao B, Liu T-Y, Qin T, Zhang X, Cheng Q-S, Ma W-Y (2005) Web image clustering by consistent utilization of visual features and surrounding texts. ACM Multimedia

  11. Gao Y, Fan J, Luo H, Satoh S (2008) A novel approach for filtering junk images from Google search results. In: Intl conf on Multimedia Modeling (MMM’08), pp 1–12

  12. He X, Ma W-Y, King O, Li M, Zhang HJ (2002) Learning and inferring a semantic space from user’s relevance feedback. ACM Multimedia

  13. Jaimes A, Chang S-F, Loui AC (2003) Detection of non-identical duplicate consumer photographs. In: Proc PCM

  14. Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM CIVR

  15. Jing Y, Baluja S (2008) PageRank for product image search. In: ACM WWW, pp 307–315

  16. Ke Y, Sukthankar R, Huston L (2004) Effective near-duplicate detection and sub-image retrieval. ACM Multimedia

  17. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2:83–97

    Article  Google Scholar 

  18. Lipson P, Grimson E, Sinha P (1997) Configuration based scene classification and image indexing. In: CVPR

  19. Loeff N, Alm CO, Forsyth DA (2006) Discriminating image senses by clustering with multimodal features. In: Proc of COLING/ACL, pp 547–554

  20. Lowe D (2004) Distinctive image features from scale invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  21. Meng Y, Chang E, Li B (2003) Enhancing dpf for near-replica image recognition. In: IEEE CVPR

  22. Quelhas P, Monay F, Odobez J-M, Gatica-Perez D, Tuytelaars T, Van Gool LJ (2005) Modeling scenes with local descriptors and latent aspects. IEEE ICCV 883–890

  23. Rege M, Dong M, Hua J (2008) Graph theoretical framework for simultaneously integrating visual and textual features for efficient Web image clustering. In: WWW

  24. Rui Y, Huang TS, Ortega M, Mehrotra S (1998) Relevance feedback: a power tool in interactive content-based image retrieval. IEEE Trans CSVT 8(5):644–655

    Google Scholar 

  25. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans PAMI 22:888–905

    Article  Google Scholar 

  26. Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans PAMI 28(7):1088–1099

    Article  Google Scholar 

  27. Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans Multimedia 8(4):716–727

    Article  Google Scholar 

  28. Tong S, Chang EY (2001) Support vector machine active learning for image retrieval. ACM Multimedia, pp 107–118

  29. Wang X-J, Ma W-Y, Xue G-R, Li X (2004) Multi-modal similarity propagation and its application for Web image retrieval. ACM Multimedia

  30. Wang B, Li Z, Li M, Ma W-Y (2006) Large-scale duplicate detection for Web image search. In: IEEE ICME

  31. Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21–35

    Article  MathSciNet  Google Scholar 

  32. Wnuk K, Soatto S (2008) Filtering internet image search results towards keyword based category recognition. In: CVPR

  33. Wu X, Ngo C-W, Hauptmann AG, Tan HK (2009) Real-time near-duplicate elimination for Web video search with content and context. IEEE Trans Multimedia 11(2):196–207

    Article  Google Scholar 

  34. Xie F, Shen Y, He X (2010) K-way min-max cut for image clustering and junk images filtering from Google images. In: ACM MM

  35. Yang L, Hanjalic A (2010) Supervised reranking for Web image search. In: Proceedings of the international conference on multimedia, MM’10, ACM MM

  36. Zhang D, Chang S-F (2004) Detecting image near-duplicate by stochastic attributed relational graph matching with learning. ACM Multimedia

  37. Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238

    Article  Google Scholar 

  38. Zhou X, Huang T (2001) Small sample learning during multimedia retrieval. In: Proc IEEE CVPR, pp 11–17

Download references

Acknowledgements

This work is partly supported by NSFC-61075014 and NSFC-60875016, by the Program for New Century Excellent Talents in University under Grant NCET-07-0693, NCET-08-0458 and NCET-10-0071 and the Research Fund for the Doctoral Program of Higher Education of China (Grant No.20096102110025).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunlei Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, C., Peng, J., Feng, X. et al. Integrating bilingual search results for automatic junk image filtering. Multimed Tools Appl 70, 661–688 (2014). https://doi.org/10.1007/s11042-012-1051-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1051-y

Keywords

Navigation