Advertisement

Journal of Computer Science and Technology

, Volume 34, Issue 2, pp 287–304 | Cite as

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

  • Mengqi Zeng
  • Bin YaoEmail author
  • Zhi-Jie Wang
  • Yanyan Shen
  • Feifei Li
  • Jianfeng Zhang
  • Hao Lin
  • Minyi Guo
Regular Paper
  • 9 Downloads

Abstract

The combination of visual and textual information in image retrieval remarkably alleviates the semantic gap of traditional image retrieval methods, and thus it has attracted much attention recently. Image retrieval based on such a combination is usually called the content-and-text based image retrieval (CTBIR). Nevertheless, existing studies in CTBIR mainly make efforts on improving the retrieval quality. To the best of our knowledge, little attention has been focused on how to enhance the retrieval efficiency. Nowadays, image data is widespread and expanding rapidly in our daily life. Obviously, it is important and interesting to investigate the retrieval efficiency. To this end, this paper presents an efficient image retrieval method named CATIRI (content-and-text based image retrieval using indexing). CATIRI follows a three-phase solution framework that develops a new indexing structure called MHIM-tree. The MHIM-tree seamlessly integrates several elements including Manhattan Hashing, Inverted index, and M-tree. To use our MHIM-tree wisely in the query, we present a set of important metrics and reveal their inherent properties. Based on them, we develop a top-k query algorithm for CTBIR. Experimental results based on benchmark image datasets demonstrate that CATIRI outperforms the competitors by an order of magnitude.

Keywords

image retrieval text-and-visual feature indexing top-k 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2019_1911_MOESM1_ESM.pdf (593 kb)
ESM 1 (PDF 592 kb)

References

  1. [1]
    Datta R, Joshi D, Li J, Wang J Z. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): Article No. 5.Google Scholar
  2. [2]
    Long M, Cao Y, Wang J, Yu P S. Composite correlation quantization for efficient multimodal retrieval. In Proc. the 39th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2016, pp.579-588.Google Scholar
  3. [3]
    Zhu L, Shen J, Xie L, Cheng Z. Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowledge and Data Engineering, 2017, 29(2): 472-486.CrossRefGoogle Scholar
  4. [4]
    Xu B, Bu J, Chen C, Cai D, He X. EMR: A scalable graph-based ranking model for content-based image retrieval. IEEE Trans. Knowledge and Data Engineering, 2015, 27(1): 102-114.CrossRefGoogle Scholar
  5. [5]
    Shen H T, Jiang S, Tan K L, Huang Z, Zhou X. Speed up interactive image retrieval. The VLDB Journal, 2009, 18(1): 329-343.CrossRefGoogle Scholar
  6. [6]
    Falchi F, Lucchese C, Orlando S, Perego R, Rabitti F. Caching content-based queries for robust and efficient image retrieval. In Proc. the 12th Int. Conf. Extending Database Technology: Advances in Database Technology, Mar. 2009, pp.780-790.Google Scholar
  7. [7]
    Zhang C, Chai J Y, Jin R. User term feedback in interactive text-based image retrieval. In Proc. the 28th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2005, pp.51-58.Google Scholar
  8. [8]
    Li W, Duan L, Xu D, Tsang I W. Text-based image retrieval using progressive multi-instance learning. In Proc. Int. Conf. Computer Vision, Nov. 2011, pp.2049-2055.Google Scholar
  9. [9]
    Wu L, Jin R, Jain A K. Tag completion for image retrieval. IEEE Trans. Pattern Analysis and Machine Intelligence, 2013, 35(3): 716-727.CrossRefGoogle Scholar
  10. [10]
    Tong S, Chang E. Support vector machine active learning for image retrieval. In Proc. the 9th ACM Int. Conf. Multimedia, Sept. 2001, pp.107-118.Google Scholar
  11. [11]
    Liu D, Hua K A, Vu K. Fast query point movement techniques with relevance feedback for content-based image retrieval. In Proc. the 10th Int. Conf. Extending Database Technology, Mar. 2006, pp.700-717.Google Scholar
  12. [12]
    Kulis B, Grauman K. Kernelized locality-sensitive hashing for scalable image search. In Proc. the 12th IEEE Int. Conf. Computer Vision, Sept. 2009, pp.2130-2137.Google Scholar
  13. [13]
    Smeulders A W M, Worring M, Santini S, Gupta A, Jain R C. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349-1380.CrossRefGoogle Scholar
  14. [14]
    Deng J, Berg A C, Li F F. Hierarchical semantic indexing for large scale image retrieval. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2011, pp.785-792.Google Scholar
  15. [15]
    Ooi B C, Tan K L, Chua T S, Hsu W. Fast image retrieval using color-spatial information. The VLDB Journal, 1998, 7(2): 115-128.CrossRefGoogle Scholar
  16. [16]
    Xia H, Wu P, Hoi S C H, Jin R. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval. In Proc. the 35th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2012, pp.55-64.Google Scholar
  17. [17]
    Christel M G. Examining user interactions with video retrieval systems. In Proc. the 2017 International Society for Optical Engineering, Oct. 2007, Article No. 650606.Google Scholar
  18. [18]
    Zhou X S, Huang T S. Unifying keywords and visual contents in image retrieval. IEEE Multimedia, 2002, 9(2): 23-33.CrossRefGoogle Scholar
  19. [19]
    Zagoris K, Chatzichristofis S A, Arampatzis A. Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval. In Proc. the 34th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Dec. 2011, pp.1251-1252.Google Scholar
  20. [20]
    Caicedo J C, Moreno J G, Niño E A, González F A. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Proc. the 11th ACM SIGMM Int. Conf. Multimedia Information Retrieval, Mar. 2010, pp.359-366.Google Scholar
  21. [21]
    Clinchant S, Ah-Pine J, Csurka G. Semantic combination of textual and visual information in multimedia retrieval. In Proc. the 1st ACM Int. Conf. Multimedia Retrieval, Apr. 2011, Article No. 44.Google Scholar
  22. [22]
    Kong W, Li W J, Guo M. Manhattan hashing for large-scale image retrieval. In Proc. the 35th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2012, pp.45-54.Google Scholar
  23. [23]
    Zobel J, Moffat A. Inverted files for text search engines. ACM Computing Surveys, 2006, 38(2): Article No. 6.Google Scholar
  24. [24]
    Ciaccia P, Patella M, Zezula P. M-tree: An efficient access method for similarity search in metric spaces. In Proc. the 23rd Int. Conf. Very Large Data Bases, Aug. 1997, pp.426-435.Google Scholar
  25. [25]
    Rasiwasia N, Pereira C J, Coviello E, Doyle G, Lanckriet G R G, Levy R, Vasconcelos N. A new approach to cross-modal multimedia retrieval. In Proc. the 18th ACM Int. Conf. Multimedia, Oct. 2010, pp.251-260.Google Scholar
  26. [26]
    Yang C, Lozano-Pérez T. Image database retrieval with multiple-instance learning techniques. In Proc. the 16th Int. Conf. Data Engineering, Feb. 2000, pp.233-243.Google Scholar
  27. [27]
    Natsev A, Rastogi R, Shim K. WALRUS: A similarity retrieval algorithm for image databases. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, Jun. 1999, pp.395-406.Google Scholar
  28. [28]
    Mamou J, Mass Y, Shmueli-Scheuer M, Sznajder B. A unified inverted index for an efficient image and text retrieval. In Proc. the 32nd Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2009, pp.814-815.Google Scholar
  29. [29]
    Rabitti F, Savino P. An information retrieval approach for image databases. In Proc. the 18th Int. Conf. Very Large Data Bases, Aug. 1992, pp.574-584.Google Scholar
  30. [30]
    Chu W W, Ieong I T, Taira R K. A semantic modeling approach for image retrieval by content. The VLDB Journal, 1994, 3(4): 445-477.CrossRefGoogle Scholar
  31. [31]
    Brown L, Gruenwald L. A prototype content-based retrieval system that uses virtual images to save space. In Proc. the 27th Int. Conf. Very Large Data Bases, Sept. 2001, pp.693-694.Google Scholar
  32. [32]
    Chen L, Gao Y, Xing Z, Jensen C S, Chen G. I2RS: A distributed geo-textual image retrieval and recommendation system. Proceedings of the VLDB Endowment, 2015, 8(12): 1884-1887.CrossRefGoogle Scholar
  33. [33]
    Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. Journal of Computer Vision, 2001, 42(3): 145-175.CrossRefzbMATHGoogle Scholar
  34. [34]
    Sivic J, Zisserman A. Video Google: A text retrieval approach to object matching in videos. In Proc. the 9th IEEE Int. Conf. Computer Vision, Oct. 2003, pp.1470-1477.Google Scholar
  35. [35]
    Ponte J M, Croft W B. A language modeling approach to information retrieval. In Proc. the 21st Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 1998, pp.275-281.Google Scholar
  36. [36]
    Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Information Systems, 2004, 22(2): 179-214.CrossRefGoogle Scholar
  37. [37]
    Depeursinge A, Müller H. Fusion techniques for combining textual and visual information retrieval. In ImageCLEF, Experimental Evaluation in Visual Information Retrieval, Müller H, Clough P, Deselaers T, Caputo B (eds.), Springer, 2010, pp.95-114.Google Scholar
  38. [38]
    Wang J, Liu W, Kumar S, Chang S. Learning to hash for indexing big data — A survey. Proceedings of the IEEE, 2016, 104(1): 34-57.CrossRefGoogle Scholar
  39. [39]
    Cao X, Chen L, Cong G, Jensen C S, Qu Q, Skovsgaard A, Wu D, Yiu M L. Spatial keyword querying. In Proc. the 31st Int. Conf. Conceptual Modeling, Oct. 2012, pp.16-29.Google Scholar
  40. [40]
    Gong Y, Lazebnik S, Gordo A, Perronnin F. Iterative quantization: A procrustean approach to learning binary codes. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2011, pp.817-824.Google Scholar
  41. [41]
    Hjaltason G R, Samet H. Distance browsing in spatial databases. ACM Trans. Database Systems, 1999, 24(2): 265-318.CrossRefGoogle Scholar
  42. [42]
    Grubinger M, Clough P, Müller H, Deselaers T. The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In Proc. International Conference on Language Resources and Evaluation, May 2006, pp.13-23.Google Scholar
  43. [43]
    Russell B C, Torralba A, Murphy K P, Freeman W T. LabelMe: A database and web-based tool for image annotation. Int. Journal of Computer Vision, 2008, 77(1/2/3): 157-173.CrossRefGoogle Scholar
  44. [44]
    Chua T S, Tang J, Hong R, Li H, Luo Z, Zheng T Y. NUS-WIDE: A real-world web image database from National University of Singapore. In Proc. the 8th ACM Int. Conf. Image and Video Retrieval, Jul. 2009, Article No. 48.Google Scholar

Copyright information

© Springer Science+Business Media, LLC & Science Press, China 2019

Authors and Affiliations

  • Mengqi Zeng
    • 1
  • Bin Yao
    • 1
    Email author
  • Zhi-Jie Wang
    • 2
    • 3
    • 4
  • Yanyan Shen
    • 1
  • Feifei Li
    • 5
  • Jianfeng Zhang
    • 6
  • Hao Lin
    • 6
  • Minyi Guo
    • 1
  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  3. 3.Guangdong Key Laboratory of Big Data Analysis and ProcessingGuangzhouChina
  4. 4.National Engineering Laboratory for Big Data Analysis and ApplicationsBeijingChina
  5. 5.School of ComputingUniversity of UtahSalt Lake CityU.S.A.
  6. 6.Alibaba GroupHangzhouChina

Personalised recommendations