Indexing for Similarity Search Operators

P, Deepak; Deshpande, Prasad M.

doi:10.1007/978-3-319-21257-9_6

Indexing for Similarity Search Operators

Deepak P¹⁷ &
Prasad M. Deshpande¹⁷

Chapter
First Online: 01 January 2015

567 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

This chapter deviates from the semantics-oriented discussion in this book and considers the motivation for indexes and efficient algorithms for evaluating similarity queries. We start with the computationally simplest case of processing similarity queries over data objects in a Euclidean space where we illustrate the utility of the k-d Tree to enable efficient search. We then relax the constraints and move to general metric spaces, where the triangle inequality becomes the most important property. We describe a commonly used index called the VP Tree for such a case. Finally, we consider the most general scenario where the distance measure can be arbitrary. The only property assumed is the monotonicity of the distance function. Here we consider two approaches—the family of Threshold algorithms and the AL Tree based indexing method and show how they can be used for Top-k search in such situations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475–486, 2006.
Google Scholar
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975.
Google Scholar
A. Bhattacharya. Fundamentals of Database Indexing and Searching. Taylor & Francis, 2014.
Google Scholar
K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD Conference, pages 346–357, 2002.
Google Scholar
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pages 426–435, 1997.
Google Scholar
P. M. Deshpande, P. Deepak, and K. Kummamuru. Efficient online top-k retrieval with arbitrary similarity measures. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, pages 356–367. ACM, 2008.
Google Scholar
P. M. Deshpande and D. Padmanabhan. Efficient reverse skyline retrieval with arbitrary nonmetric similarity measures. In EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings, pages 319–330, 2011.
Google Scholar
V. Dohnal, C. Gennaro, P. Savino, and P. Zezula. D-index: Distance searching index for metric data sets. Multimedia Tools Appl., 21(1):9–33, 2003.
Google Scholar
R. Fagin. Combining fuzzy information: an overview. SIGMOD Record, 31(2):109–118, 2002.
Google Scholar
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4):614–656, 2003.
Google Scholar
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614–656, 2003.
Google Scholar
R. A. Finkel and J. L. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Inf., 4:1–9, 1974.
Google Scholar
K. Goh, B. Li, and E. Chang. Dyndex: A dynamic and nonmetric space indexer, 2002.
Google Scholar
U. Guntzer, W.-T. Balke, and W. Kiesling. Towards efficient multi-feature queries in heterogeneous environments. itcc, 00:0622, 2001.
Google Scholar
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984, pages 47–57, 1984.
Google Scholar
K. Kummamuru, R. Krishnapuram, and R. Agrawal. On learning asymmetric dissimilarity measures. In ICDM, pages 697–700, 2005.
Google Scholar
N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggregation of ranked inputs. In ICDE, page 72, 2006.
Google Scholar
T. Mandl. Learning similarity functions in information retrieval. In EUFIT, pages 771–775, 1998.
Google Scholar
A. Marian, N. Bruno, and L. Gravano. Evaluating top- queries over web-accessible databases. ACM Trans. Database Syst., 29(2):319–362, 2004.
Google Scholar
J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst., 9(1):38–71, 1984.
Google Scholar
D. Padmanabhan and P. Deshpande. Efficient rknn retrieval with arbitrary non-metric similarity measures. PVLDB, 3(1):1243–1254, 2010.
Google Scholar
D. Padmanabhan, P. M. Deshpande, D. Majumdar, and R. Krishnapuram. Efficient skyline retrieval with arbitrary similarity measures. In EDBT, pages 1052–1063, 2009.
Google Scholar
H. Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
Google Scholar
T. Skopal. On fast non-metric similarity search by metric access methods. In EDBT, pages 718–736, 2006.
Google Scholar
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB, pages 648–659, 2004.
Google Scholar
J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4):175–179, 1991.
Google Scholar
E. Vidal. New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (aesa). Pattern Recognition Letters, 15(1):1–7, 1994.
Google Scholar
D. Xin, J. Han, and K. C.-C. Chang. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In SIGMOD Conference, pages 103–114, 2007.
Google Scholar
P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 25-27 January 1993, Austin, Texas., pages 311–321, 1993.
Google Scholar
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. Kluwer, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, Bangalore, India
Deepak P & Prasad M. Deshpande

Authors

Deepak P
View author publications
You can also search for this author in PubMed Google Scholar
Prasad M. Deshpande
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak P .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

P, D., Deshpande, P.M. (2015). Indexing for Similarity Search Operators. In: Operators for Similarity Search. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21257-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-21257-9_6
Published: 08 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21256-2
Online ISBN: 978-3-319-21257-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics