CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

Zeng, Mengqi; Yao, Bin; Wang, Zhi-Jie; Shen, Yanyan; Li, Feifei; Zhang, Jianfeng; Lin, Hao; Guo, Minyi

doi:10.1007/s11390-019-1911-2

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

Regular Paper
Published: 22 March 2019

Volume 34, pages 287–304, (2019)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Mengqi Zeng¹,
Bin Yao¹,
Zhi-Jie Wang^2,3,4,
Yanyan Shen¹,
Feifei Li⁵,
Jianfeng Zhang⁶,
Hao Lin⁶ &
…
Minyi Guo¹

238 Accesses
6 Citations
Explore all metrics

Abstract

The combination of visual and textual information in image retrieval remarkably alleviates the semantic gap of traditional image retrieval methods, and thus it has attracted much attention recently. Image retrieval based on such a combination is usually called the content-and-text based image retrieval (CTBIR). Nevertheless, existing studies in CTBIR mainly make efforts on improving the retrieval quality. To the best of our knowledge, little attention has been focused on how to enhance the retrieval efficiency. Nowadays, image data is widespread and expanding rapidly in our daily life. Obviously, it is important and interesting to investigate the retrieval efficiency. To this end, this paper presents an efficient image retrieval method named CATIRI (content-and-text based image retrieval using indexing). CATIRI follows a three-phase solution framework that develops a new indexing structure called MHIM-tree. The MHIM-tree seamlessly integrates several elements including Manhattan Hashing, Inverted index, and M-tree. To use our MHIM-tree wisely in the query, we present a set of important metrics and reveal their inherent properties. Based on them, we develop a top-k query algorithm for CTBIR. Experimental results based on benchmark image datasets demonstrate that CATIRI outperforms the competitors by an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Jiayi Ma, Xingyu Jiang, … Junchi Yan

Visual contextual relationship augmented transformer for image captioning

Article 06 April 2024

Qiang Su, Junbo Hu & Zhixin Li

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

Article Open access 02 April 2024

Mengying Xu, Linyin Luo, … Jian Yin

References

Datta R, Joshi D, Li J, Wang J Z. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): Article No. 5.
Long M, Cao Y, Wang J, Yu P S. Composite correlation quantization for efficient multimodal retrieval. In Proc. the 39th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2016, pp.579-588.
Zhu L, Shen J, Xie L, Cheng Z. Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowledge and Data Engineering, 2017, 29(2): 472-486.
Article Google Scholar
Xu B, Bu J, Chen C, Cai D, He X. EMR: A scalable graph-based ranking model for content-based image retrieval. IEEE Trans. Knowledge and Data Engineering, 2015, 27(1): 102-114.
Article Google Scholar
Shen H T, Jiang S, Tan K L, Huang Z, Zhou X. Speed up interactive image retrieval. The VLDB Journal, 2009, 18(1): 329-343.
Article Google Scholar
Falchi F, Lucchese C, Orlando S, Perego R, Rabitti F. Caching content-based queries for robust and efficient image retrieval. In Proc. the 12th Int. Conf. Extending Database Technology: Advances in Database Technology, Mar. 2009, pp.780-790.
Zhang C, Chai J Y, Jin R. User term feedback in interactive text-based image retrieval. In Proc. the 28th Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2005, pp.51-58.
Li W, Duan L, Xu D, Tsang I W. Text-based image retrieval using progressive multi-instance learning. In Proc. Int. Conf. Computer Vision, Nov. 2011, pp.2049-2055.
Wu L, Jin R, Jain A K. Tag completion for image retrieval. IEEE Trans. Pattern Analysis and Machine Intelligence, 2013, 35(3): 716-727.
Article Google Scholar
Tong S, Chang E. Support vector machine active learning for image retrieval. In Proc. the 9th ACM Int. Conf. Multimedia, Sept. 2001, pp.107-118.
Liu D, Hua K A, Vu K. Fast query point movement techniques with relevance feedback for content-based image retrieval. In Proc. the 10th Int. Conf. Extending Database Technology, Mar. 2006, pp.700-717.
Kulis B, Grauman K. Kernelized locality-sensitive hashing for scalable image search. In Proc. the 12th IEEE Int. Conf. Computer Vision, Sept. 2009, pp.2130-2137.
Smeulders A W M, Worring M, Santini S, Gupta A, Jain R C. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence, 2000, 22(12): 1349-1380.
Article Google Scholar
Deng J, Berg A C, Li F F. Hierarchical semantic indexing for large scale image retrieval. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2011, pp.785-792.
Ooi B C, Tan K L, Chua T S, Hsu W. Fast image retrieval using color-spatial information. The VLDB Journal, 1998, 7(2): 115-128.
Article Google Scholar
Xia H, Wu P, Hoi S C H, Jin R. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval. In Proc. the 35th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2012, pp.55-64.
Christel M G. Examining user interactions with video retrieval systems. In Proc. the 2017 International Society for Optical Engineering, Oct. 2007, Article No. 650606.
Zhou X S, Huang T S. Unifying keywords and visual contents in image retrieval. IEEE Multimedia, 2002, 9(2): 23-33.
Article Google Scholar
Zagoris K, Chatzichristofis S A, Arampatzis A. Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval. In Proc. the 34th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Dec. 2011, pp.1251-1252.
Caicedo J C, Moreno J G, Niño E A, González F A. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Proc. the 11th ACM SIGMM Int. Conf. Multimedia Information Retrieval, Mar. 2010, pp.359-366.
Clinchant S, Ah-Pine J, Csurka G. Semantic combination of textual and visual information in multimedia retrieval. In Proc. the 1st ACM Int. Conf. Multimedia Retrieval, Apr. 2011, Article No. 44.
Kong W, Li W J, Guo M. Manhattan hashing for large-scale image retrieval. In Proc. the 35th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 2012, pp.45-54.
Zobel J, Moffat A. Inverted files for text search engines. ACM Computing Surveys, 2006, 38(2): Article No. 6.
Ciaccia P, Patella M, Zezula P. M-tree: An efficient access method for similarity search in metric spaces. In Proc. the 23rd Int. Conf. Very Large Data Bases, Aug. 1997, pp.426-435.
Rasiwasia N, Pereira C J, Coviello E, Doyle G, Lanckriet G R G, Levy R, Vasconcelos N. A new approach to cross-modal multimedia retrieval. In Proc. the 18th ACM Int. Conf. Multimedia, Oct. 2010, pp.251-260.
Yang C, Lozano-Pérez T. Image database retrieval with multiple-instance learning techniques. In Proc. the 16th Int. Conf. Data Engineering, Feb. 2000, pp.233-243.
Natsev A, Rastogi R, Shim K. WALRUS: A similarity retrieval algorithm for image databases. In Proc. the 1999 ACM SIGMOD International Conference on Management of Data, Jun. 1999, pp.395-406.
Mamou J, Mass Y, Shmueli-Scheuer M, Sznajder B. A unified inverted index for an efficient image and text retrieval. In Proc. the 32nd Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2009, pp.814-815.
Rabitti F, Savino P. An information retrieval approach for image databases. In Proc. the 18th Int. Conf. Very Large Data Bases, Aug. 1992, pp.574-584.
Chu W W, Ieong I T, Taira R K. A semantic modeling approach for image retrieval by content. The VLDB Journal, 1994, 3(4): 445-477.
Article Google Scholar
Brown L, Gruenwald L. A prototype content-based retrieval system that uses virtual images to save space. In Proc. the 27th Int. Conf. Very Large Data Bases, Sept. 2001, pp.693-694.
Chen L, Gao Y, Xing Z, Jensen C S, Chen G. I2RS: A distributed geo-textual image retrieval and recommendation system. Proceedings of the VLDB Endowment, 2015, 8(12): 1884-1887.
Article Google Scholar
Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. Journal of Computer Vision, 2001, 42(3): 145-175.
Article MATH Google Scholar
Sivic J, Zisserman A. Video Google: A text retrieval approach to object matching in videos. In Proc. the 9th IEEE Int. Conf. Computer Vision, Oct. 2003, pp.1470-1477.
Ponte J M, Croft W B. A language modeling approach to information retrieval. In Proc. the 21st Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Aug. 1998, pp.275-281.
Zhai C, Lafferty J. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Information Systems, 2004, 22(2): 179-214.
Article Google Scholar
Depeursinge A, Müller H. Fusion techniques for combining textual and visual information retrieval. In ImageCLEF, Experimental Evaluation in Visual Information Retrieval, Müller H, Clough P, Deselaers T, Caputo B (eds.), Springer, 2010, pp.95-114.
Wang J, Liu W, Kumar S, Chang S. Learning to hash for indexing big data — A survey. Proceedings of the IEEE, 2016, 104(1): 34-57.
Article Google Scholar
Cao X, Chen L, Cong G, Jensen C S, Qu Q, Skovsgaard A, Wu D, Yiu M L. Spatial keyword querying. In Proc. the 31st Int. Conf. Conceptual Modeling, Oct. 2012, pp.16-29.
Gong Y, Lazebnik S, Gordo A, Perronnin F. Iterative quantization: A procrustean approach to learning binary codes. In Proc. the 24th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2011, pp.817-824.
Hjaltason G R, Samet H. Distance browsing in spatial databases. ACM Trans. Database Systems, 1999, 24(2): 265-318.
Article Google Scholar
Grubinger M, Clough P, Müller H, Deselaers T. The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In Proc. International Conference on Language Resources and Evaluation, May 2006, pp.13-23.
Russell B C, Torralba A, Murphy K P, Freeman W T. LabelMe: A database and web-based tool for image annotation. Int. Journal of Computer Vision, 2008, 77(1/2/3): 157-173.
Article Google Scholar
Chua T S, Tang J, Hong R, Li H, Luo Z, Zheng T Y. NUS-WIDE: A real-world web image database from National University of Singapore. In Proc. the 8th ACM Int. Conf. Image and Video Retrieval, Jul. 2009, Article No. 48.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Mengqi Zeng, Bin Yao, Yanyan Shen & Minyi Guo
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Zhi-Jie Wang
Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, 510006, China
Zhi-Jie Wang
National Engineering Laboratory for Big Data Analysis and Applications, Beijing, 100871, China
Zhi-Jie Wang
School of Computing, University of Utah, Salt Lake City, 84112, U.S.A.
Feifei Li
Alibaba Group, Hangzhou, 311121, China
Jianfeng Zhang & Hao Lin

Authors

Mengqi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Jie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Feifei Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Yao.

Electronic supplementary material

ESM 1

(PDF 592 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, M., Yao, B., Wang, ZJ. et al. CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval. J. Comput. Sci. Technol. 34, 287–304 (2019). https://doi.org/10.1007/s11390-019-1911-2

Download citation

Received: 09 July 2018
Revised: 24 January 2019
Published: 22 March 2019
Issue Date: March 2019
DOI: https://doi.org/10.1007/s11390-019-1911-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Visual contextual relationship augmented transformer for image captioning

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CATIRI: An Efficient Method for Content-and-Text Based Image Retrieval

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Visual contextual relationship augmented transformer for image captioning

Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation