Abstract
In this introductory chapter, we consider the operation of common similarity search systems, more from a semantics point of view as opposed to the efficiency-oriented view as used in typical database literature. We illustrate that the full-specification of a similarity search system involves the schema definition as well as details pertaining to the phases of pair-wise similarity estimation and result set identification. We will see how variations in the specification of pairwise similarity estimation and result set identification give rise to various similarity operators. In addition to reviewing the most common similarity operator, the top-k operator, we look at the landscape of similarity operators that have been proposed in the last two decades. We then consider the notion of similarity from a cognitive/psychological perspective and outline some assumptions of similarity measures that form conventional wisdom in such literature. In particular, we focus on those aspects that have implications to building computer-based similarity search systems, and outline some disconnects between the literature in psychology and that in computing pertaining to assumptions made about similarity measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975.
S. Borzsony, D. Kossmann, and K. Stocker. The skyline operator. In Data Engineering, 2001. Proceedings. 17th International Conference on, pages 421–430. IEEE, 2001.
C.-Y. Chan, H. Jagadish, K.-L. Tan, A. K. Tung, and Z. Zhang. On high dimensional skylines. In Advances in Database Technology-EDBT 2006, pages 478–495. Springer, 2006.
Y.-C. Chang, L. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith. The onion technique: indexing for linear optimization queries. In ACM SIGMOD Record, volume 29, pages 391–402. ACM, 2000.
E. Dellis and B. Seeger. Efficient computation of reverse skyline queries. In Proceedings of the 33rd international conference on Very large data bases, pages 291–302. VLDB Endowment, 2007.
T. Emrich, M. Franzke, N. Mamoulis, M. Renz, and A. Z¨ufle. Geo-social skyline queries. In Database Systems for Advanced Applications, pages 77–91. Springer, 2014.
R. Fagin and L. Stockmeyer. Relaxing the triangle inequality in pattern matching. International Journal of Computer Vision, 30(3):219–231, 1998.
H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. El Abbadi. Constrained nearest neighbor queries. In Advances in Spatial and Temporal Databases, pages 257–276. Springer, 2001.
R. A. Finkel and J. L. Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Inf., 4:1–9, 1974.
Y. Gao, B. Zheng, G. Chen,W.-C. Lee, K. C. Lee, and Q. Li. Visible reverse k-nearest neighbor queries. In Data Engineering, 2009. ICDE’09. IEEE 25th International Conference on, pages 1203–1206. IEEE, 2009.
G. Gilmore, H. Hersh, A. Caramazza, and J. Griffin. Multidimensional letter similarity derived from recognition errors. Perception & Psychophysics, 25(5):425–431, 1979.
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, June 18-21, 1984, pages 47–57, 1984.
A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. In Advances in Knowledge Discovery and Data Mining, pages 404–413. Springer, 2004.
W. Jin, J. Han, and M. Ester. Mining thick skylines over large databases. In Knowledge Discovery in Databases: PKDD 2004, pages 255–266. Springer, 2004.
F. Korn and S. Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In ACM SIGMOD Record, volume 29, pages 201–212. ACM, 2000.
Y. Kumar, R. Janardan, and P. Gupta. Efficient algorithms for reverse proximity query problems. In Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, page 39. ACM, 2008.
C. Li, N. Zhang, N. Hassan, S. Rajasekaran, and G. Das. On skyline groups. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 2119–2123. ACM, 2012.
X. Lian and L. Chen. Similarity search in arbitrary subspaces under l p-norm. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 317–326. IEEE, 2008.
X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: The k most representative skyline operator. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 86–95. IEEE, 2007.
Q. Liu, Y. Gao, G. Chen, Q. Li, and T. Jiang. On efficient reverse k-skyband query processing. In Database Systems for Advanced Applications, pages 544–559. Springer, 2012.
M. M¨uller. Dynamic time warping. Information retrieval for music and motion, pages 69–84, 2007.
S. Nutanong, E. Tanin, and R. Zhang. Visible nearest neighbor queries. In Advances in Databases: Concepts, Systems and Applications, pages 876–883. Springer, 2007.
D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. ACM Transactions on Database Systems (TODS), 30(1):41–82, 2005.
R. Pereira, A. Agshikar, G. Agarwal, and P. Keni. Range reverse nearest neighbor queries. In KICSS, 2013.
P. Podgorny and W. Garner. Reaction time as a measure of inter-and intraobject visual similarity: Letters of the alphabet. Perception & Psychophysics, 26(1):37–52, 1979.
V. S. Ramachandran. The tell-tale brain: A neuroscientist’s quest for what makes us human. WW Norton & Company, 2012.
R. N. Shepard. Toward a universal law of generalization for psychological science. Science, 237(4820):1317–1323, 1987.
Y. Shi and B. Graham. A similarity search approach to solving the multi-query problems. In Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on, pages 237–242. IEEE, 2012.
Y. Tao, D. Papadias, and X. Lian. Reverse knn search in arbitrary dimensionality. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 744–755. VLDB Endowment, 2004.
A. K. Tung, R. Zhang, N. Koudas, and B. C. Ooi. Similarity search: a matching based approach. In Proceedings of the 32nd international conference on Very large data bases, pages 631–642. VLDB Endowment, 2006.
A. Tversky. Features of similarity. Psychological Reviews, 84(4):327–352, 1977.
A. Tversky and I. Gati. Similarity, separability, and the triangle inequality. Psychological review, 89(2):123, 1982.
R. Yager and F. Petry. Hypermatching: Similarity matching with extreme values. Fuzzy Systems, IEEE Transactions on, 22(4):949–957, Aug 2014.
Z. Zhang, C. Jin, and Q. Kang. Reverse k-ranks query. Proceedings of the VLDB Endowment, 7(10), 2014.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 The Author(s)
About this chapter
Cite this chapter
P, D., Deshpande, P.M. (2015). Introduction. In: Operators for Similarity Search. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-21257-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-21257-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21256-2
Online ISBN: 978-3-319-21257-9
eBook Packages: Computer ScienceComputer Science (R0)