Skip to main content

Indexing for Similarity Search Operators

  • Chapter
  • First Online:
  • 567 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

This chapter deviates from the semantics-oriented discussion in this book and considers the motivation for indexes and efficient algorithms for evaluating similarity queries. We start with the computationally simplest case of processing similarity queries over data objects in a Euclidean space where we illustrate the utility of the k-d Tree to enable efficient search. We then relax the constraints and move to general metric spaces, where the triangle inequality becomes the most important property. We describe a commonly used index called the VP Tree for such a case. Finally, we consider the most general scenario where the distance measure can be arbitrary. The only property assumed is the monotonicity of the distance function. Here we consider two approaches—the family of Threshold algorithms and the AL Tree based indexing method and show how they can be used for Top-k search in such situations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475–486, 2006.

    Google Scholar 

  2. J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975.

    Google Scholar 

  3. A. Bhattacharya. Fundamentals of Database Indexing and Searching. Taylor & Francis, 2014.

    Google Scholar 

  4. K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD Conference, pages 346–357, 2002.

    Google Scholar 

  5. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pages 426–435, 1997.

    Google Scholar 

  6. P. M. Deshpande, P. Deepak, and K. Kummamuru. Efficient online top-k retrieval with arbitrary similarity measures. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, pages 356–367. ACM, 2008.

    Google Scholar 

  7. P. M. Deshpande and D. Padmanabhan. Efficient reverse skyline retrieval with arbitrary nonmetric similarity measures. In EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings, pages 319–330, 2011.

    Google Scholar 

  8. V. Dohnal, C. Gennaro, P. Savino, and P. Zezula. D-index: Distance searching index for metric data sets. Multimedia Tools Appl., 21(1):9–33, 2003.

    Google Scholar 

  9. R. Fagin. Combining fuzzy information: an overview. SIGMOD Record, 31(2):109–118, 2002.

    Google Scholar 

  10. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4):614–656, 2003.

    Google Scholar 

  11. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614–656, 2003.

    Google Scholar 

  12. R. A. Finkel and J. L. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Inf., 4:1–9, 1974.

    Google Scholar 

  13. K. Goh, B. Li, and E. Chang. Dyndex: A dynamic and nonmetric space indexer, 2002.

    Google Scholar 

  14. U. Guntzer, W.-T. Balke, and W. Kiesling. Towards efficient multi-feature queries in heterogeneous environments. itcc, 00:0622, 2001.

    Google Scholar 

  15. A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984, pages 47–57, 1984.

    Google Scholar 

  16. K. Kummamuru, R. Krishnapuram, and R. Agrawal. On learning asymmetric dissimilarity measures. In ICDM, pages 697–700, 2005.

    Google Scholar 

  17. N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggregation of ranked inputs. In ICDE, page 72, 2006.

    Google Scholar 

  18. T. Mandl. Learning similarity functions in information retrieval. In EUFIT, pages 771–775, 1998.

    Google Scholar 

  19. A. Marian, N. Bruno, and L. Gravano. Evaluating top- queries over web-accessible databases. ACM Trans. Database Syst., 29(2):319–362, 2004.

    Google Scholar 

  20. J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst., 9(1):38–71, 1984.

    Google Scholar 

  21. D. Padmanabhan and P. Deshpande. Efficient rknn retrieval with arbitrary non-metric similarity measures. PVLDB, 3(1):1243–1254, 2010.

    Google Scholar 

  22. D. Padmanabhan, P. M. Deshpande, D. Majumdar, and R. Krishnapuram. Efficient skyline retrieval with arbitrary similarity measures. In EDBT, pages 1052–1063, 2009.

    Google Scholar 

  23. H. Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.

    Google Scholar 

  24. T. Skopal. On fast non-metric similarity search by metric access methods. In EDBT, pages 718–736, 2006.

    Google Scholar 

  25. M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB, pages 648–659, 2004.

    Google Scholar 

  26. J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4):175–179, 1991.

    Google Scholar 

  27. E. Vidal. New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (aesa). Pattern Recognition Letters, 15(1):1–7, 1994.

    Google Scholar 

  28. D. Xin, J. Han, and K. C.-C. Chang. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In SIGMOD Conference, pages 103–114, 2007.

    Google Scholar 

  29. P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 25-27 January 1993, Austin, Texas., pages 311–321, 1993.

    Google Scholar 

  30. P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. Kluwer, 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak P .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Author(s)

About this chapter

Cite this chapter

P, D., Deshpande, P.M. (2015). Indexing for Similarity Search Operators. In: Operators for Similarity Search. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21257-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21257-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21256-2

  • Online ISBN: 978-3-319-21257-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics