Abstract
Metric spaces are a very active research field which offers efficient methods for indexing and searching by similarity in large data sets. In this paper we present a new clustering-based method for similarity search called SSSTree. Its main characteristic is that the centers of each cluster are selected using Sparse Spatial Selection (SSS), a technique initially developed for the selection of pivots. SSS is able to adapt the set of selected points (pivots or cluster centers) to the intrinsic dimensionality of the space. Using SSS, the number of clusters in each node of the tree depends on the complexity of the subspace it represents. The space partition in each node will be made depending on that complexity, improving thus the performance of the search operation. In this paper we present this new method and provide experimental results showing that SSSTree performs better than previously proposed indexes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Computing Surveys 33, 273–321 (2001)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search. The metric space approach 32 (2006)
Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Communications of the ACM 16, 230–236 (1973)
Baeza-Yates, R., Cunto, W., Manber, U., Wu, S.: Proximity matching using fixed-queries trees. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 198–212. Springer, Heidelberg (1994)
Baeza-Yates, R.: Searching: an algorithmic tour. Encyclopedia of Computer Science and Technology 37, 331–359 (1997)
Chávez, E., Marroquín, J.L., Navarro, G.: Overcoming the curse of dimensionality. In: CBMI 1999. European Workshop on Content-based Multimedia Indexing, pp. 57–64 (1999)
Yianilos, P.: Data structures and algorithms for nearest-neighbor search in general metric space. In: Proceedings of the fourth annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311–321 (1993)
Bozkaya, T., Ozsoyoglu, M.: Distance-based indexing for high-dimensional metric spaces. In: SIGMOD 1997. Proceedings of the ACM International Conference on Management of Data, pp. 357–368 (1997)
Yianilos, P.: Excluded middle vantage point forests for nearest neighbor search. In: Goodrich, M.T., McGeoch, C.C. (eds.) ALENEX 1999. LNCS, vol. 1619, Springer, Heidelberg (1999)
Vidal, E.: An algorithm for finding nearest neighbors in (aproximately) constant average time. Pattern Recognition Letters 4, 145–157 (1986)
Micó, L., Oncina, J., Vidal, R.E.: A new version of the nearest-neighbor approximating and eliminating search (aesa) with linear pre-processing time and memory requirements. Pattern Recognition Letters 15, 9–17 (1994)
Kalantari, I., McDonald, G.: A data structure and an algorithm for the nearest point problem. IEEE Transactions on Software Engineering 9, 631–634 (1983)
Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40, 175–179 (1991)
Brin, S.: Near neighbor search in large metric spaces. In: 21st conference on Very Large Databases (1995)
Brisaboa, N.R., Pedreira, O.: Spatial selection of sparse pivots for similarity search in metric spaces. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds.) SOFSEM 2007. LNCS, vol. 4362, pp. 434–445. Springer, Heidelberg (2007)
Uribe, R., Navarro, G., Barrientos, R.J., Marín, M.: An index data structure for searching in metric space databases. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds.) ICCS 2006. LNCS, vol. 3991, pp. 611–617. Springer, Heidelberg (2006)
Bustos, B., Navarro, G., Chávez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters 24(14), 2357–2366 (2003)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB 1997. Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 426–435 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brisaboa, N., Pedreira, O., Seco, D., Solar, R., Uribe, R. (2008). Clustering-Based Similarity Search in Metric Spaces with Sparse Spatial Centers. In: Geffert, V., Karhumäki, J., Bertoni, A., Preneel, B., Návrat, P., Bieliková, M. (eds) SOFSEM 2008: Theory and Practice of Computer Science. SOFSEM 2008. Lecture Notes in Computer Science, vol 4910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77566-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-77566-9_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77565-2
Online ISBN: 978-3-540-77566-9
eBook Packages: Computer ScienceComputer Science (R0)