On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10657)


Approximate similarity search algorithms based on hashing were proposed to query high-dimensional datasets due to its fast retrieval speed and low storage cost. Recent studies, promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, there are challenges to solve in order to find a practical and efficient solution to index CNN features, such as the need for heavy training process to achieve accurate query results and the critical dependency on data-parameters. Aiming to overcome these issues, we propose a new method for scalable similarity search, i.e., Deep frActal based Hashing (DAsH), by computing the best data-parameters values for optimal sub-space projection exploring the correlations among CNN features attributes using fractal theory. Moreover, inspired by recent advances in CNNs, we use not only activations of lower layers which are more general-purpose but also previous knowledge of the semantic data on the latest CNN layer to improve the search accuracy. Thus, our method produces a better representation of the data space with a less computational cost for a better accuracy. This significant gain in speed and accuracy allows us to evaluate the framework on a large, realistic, and challenging set of datasets.


Multidimensional index Approximate similarity search Fractal theory Deep learning 



This project has been partially funded by CIENCIA-ACTIVA (Perú) through the Doctoral Scholarship at UNSA University, and FONDECYT (Perú) Project 148-2015.


  1. 1.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  2. 2.
    Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proceedings of the 14th International Conference on World Wide Web, pp. 651–660, Chiba, Japan (2005)Google Scholar
  3. 3.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRefGoogle Scholar
  4. 4.
    Bones, C.C., Romani, L.A.S., de Sousa, E.P.M.: Clustering multivariate data streams by correlating attributes using fractal dimension. JIDM 7(3), 249–264 (2016)Google Scholar
  5. 5.
    Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12) (2013)Google Scholar
  6. 6.
    Traina Jr., C., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast feature selection using fractal dimension. JIDM 1(1), 3–16 (2010)Google Scholar
  7. 7.
    Kalantidis, Y., Avrithis, Y.: Locally optimized product quantization for approximate nearest neighbor search. In: 2014 IEEE CVPR, pp. 2329–2336 (2014)Google Scholar
  8. 8.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems NIPS 2012, pp. 1097–1105, USA (2012)Google Scholar
  9. 9.
    Li, Z., Liu, J., Tang, J., Lu, H.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10) (2015)Google Scholar
  10. 10.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the International Conference on Very Large Data Bases, pp. 950–961, Vienna, Austria (2007)Google Scholar
  11. 11.
    Moysey Brio, A.Z., Webb, G.M.: Chapter 7 fractal dimension. Math. Sci. Eng. 209, 167–218 (2007). L-System FractalsCrossRefGoogle Scholar
  12. 12.
    Ocsa, A., Sousa, E.P.M.: An adaptive multi-level hashing structure for fast approximate similarity search. JIDM 1(3), 359–374 (2010)Google Scholar
  13. 13.
    Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor methods in learning and vision: theory and practice. In: Locality-Sensitive Hashing Using Stable Distributions, pp. 55–67. The MIT Press (2006)Google Scholar
  14. 14.
    Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: CVPR, pp. 37–45. IEEE Computer Society (2015)Google Scholar
  15. 15.
    Traina Jr., C., Traina, A., Wu, L., Faloutsos, C.: Fast feature selection using fractal dimension. J. Inf. Data Manag. 1(1), 3 (2010)Google Scholar
  16. 16.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? CoRR abs/1411.1792 (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Universidad Nacional de San AgustinArequipaPeru
  2. 2.Universidad La SalleArequipaPeru

Personalised recommendations