Skip to main content

Query Filtering with Low-Dimensional Local Embeddings

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11807))

Included in the following conference series:

  • 1042 Accesses

Abstract

The concept of local pivoting is to partition a metric space so that each element in the space is associated with precisely one of a fixed set of reference objects or pivots. The idea is that each object of the data set is associated with the reference object that is best suited to filter that particular object if it is not relevant to a query, maximising the probability of excluding it from a search. The notion does not in itself lead to a scalable search mechanism, but instead gives a good chance of exclusion based on a tiny memory footprint and a fast calculation. It is therefore most useful in contexts where main memory is at a premium, or in conjunction with another, scalable, mechanism.

In this paper we apply similar reasoning to metric spaces which possess the four-point property, which notably include Euclidean, Cosine, Triangular, Jensen-Shannon, and Quadratic Form. In this case, each element of the space can be associated with two reference objects, and a four-point lower-bound property is used instead of the simple triangle inequality. The probability of exclusion is strictly greater than with simple local pivoting; the space required per object and the calculation are again tiny in relative terms.

We show that the resulting mechanism can be very effective. A consequence of using the four-point property is that, for m reference points, there are \(m \atopwithdelims ()2\) pivot pairs to choose from, giving a very good chance of a good selection being available from a small number of distance calculations. Finding the best pair has a quadratic cost with the number of references; however, we provide experimental evidence that good heuristics exist. Finally, we show how the resulting mechanism can be integrated with a more scalable technique to provide a very significant performance improvement, for a very small overhead in build-time and memory cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    for the correct formulation, see [10].

  2. 2.

    as this is marginally more efficient than storing the distances to \(p_i\) and \(p_j\).

  3. 3.

    colors: 0.052, 0.083, 0.131; nasa: 0.12, 0.285, 0.53.

References

  1. Amato, G., Gennaro, C., Savino, P.: Mi-file: using inverted files for scalable approximate similarity search. Multimedia Tools Appl. 71(3), 1333–1362 (2014)

    Article  Google Scholar 

  2. Baeza-Yates, R., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed-queries trees. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 198–212. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58094-8_18

    Chapter  Google Scholar 

  3. Jon Louis Bentley: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  4. Blumenthal, L.M.: A note on the four-point property. Bull. Am. Math. Soc. 39(6), 423–426 (1933)

    Article  MathSciNet  Google Scholar 

  5. Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Commun. ACM 16(4), 230–236 (1973)

    Article  Google Scholar 

  6. Celik, C.: Priority vantage points structures for similarity queries in metric spaces. In: Shafazand, H., Tjoa, A.M. (eds.) EurAsia-ICT 2002. LNCS, vol. 2510, pp. 256–263. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36087-5_30

    Chapter  Google Scholar 

  7. Chávez, E., L Marroquín, J., Baeza-Yates, R.: Spaghettis: an array based algorithm for similarity queries in metric spaces. In: String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware, pp. 38–46. IEEE (1999)

    Google Scholar 

  8. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26(9), 1363–1376 (2005)

    Article  Google Scholar 

  9. Chavez, E., Ruiz, U., Tellez, E.: CDA: succinct spaghetti. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 54–64. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_5

    Chapter  Google Scholar 

  10. Connor, R., Cardillo, F.A., Vadicamo, L., Rabitti, F.: Hilbert exclusion: improved metric search through finite isometric embeddings. ACM Trans. Inf. Syst. 35(3), 17:1–17:27 (2016)

    Article  Google Scholar 

  11. Connor, R., Vadicamo, L., Cardillo, F.A., Rabitti, F.: Supermetric search with the four-point property. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 51–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46759-7_4

    Chapter  Google Scholar 

  12. Connor, R., Vadicamo, L., Cardillo, F.A., Rabitti, F.: Supermetric search. Inf. Syst. 80, 108–123 (2018)

    Article  Google Scholar 

  13. Connor, R., Vadicamo, L., Rabitti, F.: High-dimensional simplexes for supermetric search. In: Beecks, C., Borutta, F., Kröger, P., Seidl, T. (eds.) SISAP 2017. LNCS, vol. 10609, pp. 96–109. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-68474-1_7

    Chapter  Google Scholar 

  14. Figueroa, K., Chávez, E., Navarro, G., Paredes, R.: Speeding up spatial approximation search in metric spaces. J. Exp. Algorithmics (JEA) 14, 6 (2009)

    MathSciNet  MATH  Google Scholar 

  15. Figueroa, K., Navarro, G., Chávez, E.: Metric spaces library (2007). http://www.sisap.org

  16. Menger, K.: Untersuchungen ber allgemeine metrik. Math. Ann. 100, 75–163 (1928)

    Article  MathSciNet  Google Scholar 

  17. Micó, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15(1), 9–17 (1994)

    Article  Google Scholar 

  18. Rubinstein, A.: Hardness of approximate nearest neighbor search. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1260–1268. ACM (2018)

    Google Scholar 

  19. Ruiz, G., Santoyo, F., Chávez, E., Figueroa, K., Tellez, E.S.: Extreme pivots for faster metric indexes. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 115–126. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41062-8_12

    Chapter  Google Scholar 

  20. Skopal, T., Pokorný, J., Snášel, V.: Nearest neighbours search using the PM-tree. In: Zhou, L., Ooi, B.C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 803–815. Springer, Heidelberg (2005). https://doi.org/10.1007/11408079_73

    Chapter  Google Scholar 

  21. Vidal, E.: New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA). Pattern Recogn. Lett. 15(1), 1–7 (1994)

    Article  MathSciNet  Google Scholar 

  22. Wilson, W.A.: A relation between metric and Euclidean spaces. Am. J. Math. 54(3), 505–517 (1932)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Connor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chávez, E., Connor, R., Vadicamo, L. (2019). Query Filtering with Low-Dimensional Local Embeddings. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32047-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32046-1

  • Online ISBN: 978-3-030-32047-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics