Abstract
High-dimensional indexing is fundamental in multimedia research field. Compact binary code indexing has achieved significant success in recent years for its effective approximation of high-dimensional data. However, most of existing binary code methods adopt linear scan to find near neighbors, which involve unnecessary computations and thus degrade search efficiency especially in large scale applications. To avoid searching codes that are not near neighbors with high probability, we propose a framework that index binary codes in clusters and only codes in relevant clusters are scanned. Consequently, Pivot Based Locality Sensitive Clustering (PLSC) is proposed and Density Adaptive Binary coding (DAB) method in PLSC clusters is presented. PLSC uses pivots to estimate similarities between data points and generates clusters based on the Locality Sensitive Hashing scheme. DAB adopts different binary code generation methods according to cluster densities. Experiments on open datasets show that offline indexing based on PLSC is efficient and DAB codes in PLSC clusters achieve significant improvement on search efficiency compared to the state of the art binary codes.
Similar content being viewed by others
References
Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1998) An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J ACM 45(6):891–923. doi:http://doi.acm.org/10.1145/293347.293348
Brandt J (2010) Transform coding for fast approximate nearest neighbor search in high dimensions. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 13–18 June 2010. pp 1815–1822
Daoudi I, Idrissi K, Ouatik SE, Baskurt A, Aboutajdine D (2009) An efficient high-dimensional indexing method for content-based retrieval in large image databases. Imag Commun 24(10):775–790. doi:10.1016/j.image.2009.09.001
Datar M, Immorlica N, Indyk P, Mirrokni V (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: In SCG’04: Proceedings of the twentieth annual symposium on Computational geometry. ACM, pp 253–262
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. Paper presented at the 25th International Conference on Very Large Databases (VLDB)
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. SIGMOD Rec 14(2):47–57. doi:10.1145/971697.602266
Herve J (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Machine Intell 33:117–128
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. Paper presented at the Proceedings of the thirtieth annual ACM symposium on Theory of computing, Dallas, Texas, United States
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. Computer Vision¨CECCV 2008:304–317
Jun W, Kumar S, Shih-Fu C (2010) Semi-supervised hashing for scalable image retrieval. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 13–18 June 2010 pp 3424–3431. doi:10.1109/cvpr.2010.5539994
Junfeng H, Radhakrishnan R, Shih-Fu C, Bauer C (2011) Compact hashing with joint optimization of search accuracy and time. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 20–25 June 2011. pp 753–760. doi:10.1109/cvpr.2011.5995518
Kuo Y-H, Chen K-T, Chiang C-H, Hsu WH (2009) Query expansion for hash-based image object retrieval. Paper presented at the Proceedings of the seventeen ACM International Conference on Multimedia, Beijing, China
Lowe DG (1999) Object recognition from local scale-invariant features. In: Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, 1999, vol 1152: pp 1150–1157
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe LSH: efficient indexing for high-dimensional similarity search. Paper presented at the Proceedings of the 33rd international conference on Very large data bases
Min K, Yang L, Wright J, Wu L, Hua X-S, Ma Y (2010) Compact projection: simple and efficient near neighbor search with practical memory requirements. Paper presented at the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition San Francisco, USA
Mu Y, Sun J, Han T, Cheong L-F, Yan S (2010) Randomized locality sensitive vocabularies for bag-of-features model. Computer Vision – ECCV 2010. In: Daniilidis K, Maragos P, Paragios N (eds), vol 6313. Lecture Notes in Computer Science. Springer Berlin/Heidelberg, pp 748–761. doi:10.1007/978-3-642-15558-1_54
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175. doi:10.1023/a:1011139631724
Panigrahy R (2006) Entropy based nearest neighbor search in high dimensions. Paper presented at the Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithm
Poullot S, Buisson O, Crucianu M (2007) Z-grid-based probabilistic retrieval for scaling up content-based copy detection. Paper presented at the Proceedings of the 6th ACM international conference on Image and video retrieval
Rongrong J, Xing X, Hongxun Y, Wei-Ying M (2009) Vocabulary hierarchy optimization for effective and transferable retrieval. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 20–25 June 2009. pp 1161–1168
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. Paper presented at the Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Tang J, Yan S, Hong R, Qi G-J, Chua T-S (2009) Inferring semantic concepts from community-contributed images and noisy tags. In Proceedings of the 17th ACM International Conference on Multimedia, 2009, pp. 223-232.
Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA, 2008. IEEE Computer Society, pp 1–8
Wang W, Zhang D, Zhang Y et al (2011) Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Trans Multimed 13(6):1308–1318
Wan-Lei Z, Chong-Wah N, Hung-Khoon T, Xiao W (2007) Near-duplicate keyframe identification with interest point matching and pattern learning. Multimed IEEE Trans 9(5):1037–1048. doi:10.1109/tmm.2007.898928
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 2008. doi:citeulike-article-id:9371300
Xie H, Gao K, Zhang Y, Tang S et al (2011) Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE Trans Multimed 13(6):1319–1332
Zhang W, Gao K, Zhang Y, Li J (2011) Efficient approximate nearest neighbor search with integrated binary codes. Paper presented at the Proceedings of the 19th ACM international conference on Multimedia, Scottsdale, Arizona, USA
Acknowledgments
This work was supported by the National Nature Science Foundation of China (61271428, 61273247), National Key Technology Research and Development Program of China (2012BAH39B02), National Basic Research Program of China (973Program, 2013CB329502) and Co‐building Program of Beijing Municipal Education Commission.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Gao, K., Zhang, Y. et al. Efficient binary code indexing with pivot based locality sensitive clustering. Multimed Tools Appl 69, 491–512 (2014). https://doi.org/10.1007/s11042-012-1354-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-012-1354-z