Abstract
High-dimensional data indexing and query is a challenging problem due to the inherent sparsity of the data. Fast algorithms are in an urgent need in this field. In this paper, an automatic subspace dimension selection (ASDS) based clustering algorithm is derived from the well-known projection-based clustering algorithm, ORCLUS, and a two-level architecture for high-dimensional data indexing and query is also proposed, which integrates projected clusters and principal axis trees (PAT) to generate efficient high-dimensional data indexes. The query performances of similarity search by ASDS+PAT, ORCLUS+PAT, PAT alone, and Clindex are compared on two high-dimensional data sets. The results show that the integration of ASDS and PAT is an efficient indexing architecture and considerably reduces the query time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proc. of the ACM SIGMOD Conf., Philadelphia, PA, pp. 61–72 (1999)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. Sigmod Record 29, 70–92 (2000)
Castelli, V., Thomasian, A., Li, C.-S.: CSVD: clustering and singular value decomposition for approximate similarity searches in high dimensional space. IEEE Trans on Knowledge and Data Engineering 15, 671–685 (2003)
Grabmeier, J., Rudolph, A.: Techniques of Cluster Algorithms in Data Mining. Data Mining and Knowledge Discovery 6, 303–360 (2002)
Li, C., Chang, E., Garcia-Molina, H., Wang, J., Wiederhold, G.: Clindex: Clustering for similarity queries in high-dimensional spaces. IEEE Trans. on Knowledge and Engineering 14, 792–808 (2002)
Lu, G.: Techniques and data structures for efficient multimedia retrieval based on similarity. IEEE Trans. on Multimedia 4, 372–384 (2002)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)
McNames, J.: A fast nearest neighbor algorithm based on a principal axis search tree. IEEE Trans. on Pattern Analysis and Intelligence 23, 964–976 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, B., Gan, J.Q. (2004). Integration of Projected Clusters and Principal Axis Trees for High-Dimensional Data Indexing and Query. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-28651-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive