Abstract
This chapter deviates from the semantics-oriented discussion in this book and considers the motivation for indexes and efficient algorithms for evaluating similarity queries. We start with the computationally simplest case of processing similarity queries over data objects in a Euclidean space where we illustrate the utility of the k-d Tree to enable efficient search. We then relax the constraints and move to general metric spaces, where the triangle inequality becomes the most important property. We describe a commonly used index called the VP Tree for such a case. Finally, we consider the most general scenario where the distance measure can be arbitrary. The only property assumed is the monotonicity of the distance function. Here we consider two approaches—the family of Threshold algorithms and the AL Tree based indexing method and show how they can be used for Top-k search in such situations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475–486, 2006.
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975.
A. Bhattacharya. Fundamentals of Database Indexing and Searching. Taylor & Francis, 2014.
K. C.-C. Chang and S. won Hwang. Minimal probing: supporting expensive predicates for top-k queries. In SIGMOD Conference, pages 346–357, 2002.
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pages 426–435, 1997.
P. M. Deshpande, P. Deepak, and K. Kummamuru. Efficient online top-k retrieval with arbitrary similarity measures. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, pages 356–367. ACM, 2008.
P. M. Deshpande and D. Padmanabhan. Efficient reverse skyline retrieval with arbitrary nonmetric similarity measures. In EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings, pages 319–330, 2011.
V. Dohnal, C. Gennaro, P. Savino, and P. Zezula. D-index: Distance searching index for metric data sets. Multimedia Tools Appl., 21(1):9–33, 2003.
R. Fagin. Combining fuzzy information: an overview. SIGMOD Record, 31(2):109–118, 2002.
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66(4):614–656, 2003.
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614–656, 2003.
R. A. Finkel and J. L. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Inf., 4:1–9, 1974.
K. Goh, B. Li, and E. Chang. Dyndex: A dynamic and nonmetric space indexer, 2002.
U. Guntzer, W.-T. Balke, and W. Kiesling. Towards efficient multi-feature queries in heterogeneous environments. itcc, 00:0622, 2001.
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984, pages 47–57, 1984.
K. Kummamuru, R. Krishnapuram, and R. Agrawal. On learning asymmetric dissimilarity measures. In ICDM, pages 697–700, 2005.
N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggregation of ranked inputs. In ICDE, page 72, 2006.
T. Mandl. Learning similarity functions in information retrieval. In EUFIT, pages 771–775, 1998.
A. Marian, N. Bruno, and L. Gravano. Evaluating top- queries over web-accessible databases. ACM Trans. Database Syst., 29(2):319–362, 2004.
J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst., 9(1):38–71, 1984.
D. Padmanabhan and P. Deshpande. Efficient rknn retrieval with arbitrary non-metric similarity measures. PVLDB, 3(1):1243–1254, 2010.
D. Padmanabhan, P. M. Deshpande, D. Majumdar, and R. Krishnapuram. Efficient skyline retrieval with arbitrary similarity measures. In EDBT, pages 1052–1063, 2009.
H. Samet. Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
T. Skopal. On fast non-metric similarity search by metric access methods. In EDBT, pages 718–736, 2006.
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB, pages 648–659, 2004.
J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4):175–179, 1991.
E. Vidal. New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (aesa). Pattern Recognition Letters, 15(1):1–7, 1994.
D. Xin, J. Han, and K. C.-C. Chang. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In SIGMOD Conference, pages 103–114, 2007.
P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 25-27 January 1993, Austin, Texas., pages 311–321, 1993.
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. Kluwer, 2006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 The Author(s)
About this chapter
Cite this chapter
P, D., Deshpande, P.M. (2015). Indexing for Similarity Search Operators. In: Operators for Similarity Search. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21257-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-21257-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21256-2
Online ISBN: 978-3-319-21257-9
eBook Packages: Computer ScienceComputer Science (R0)