Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Dimensionality Reduction Techniques for Nearest-Neighbor Computations

  • Alexander Thomasian
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80771

Synonyms

Clustering; Karhunen-Loève transform (KLT); Multi-dimensional indexing; Nearest neighbors query; Principal component analysis (PCA); Singular value decomposition (SVD)

Definition

Representing objects such as images by their feature vectors and searching for similarity according to the distances of the points representing them in high-dimensional space via k-nearest-neighbor (k-NN) queries to a target image are a popular paradigm. Dimensionality reduction via singular value decomposition (SVD) to individual clusters of a dataset results in higher dimensionality reduction for the same normalized mean square error (NMSE) than applying singular value decomposition (SVD) to the whole dataset. The cost of processing k-NN queries is further reduced by suitable indexing structures such as the ordered partition (OP)-tree and the stepwise dimensionality increasing (SDI)-tree.

Historical Background

IBM’s Query by Image Content (QBIC) project, which utilized content-based image retrieval...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS, Park JS. Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference; 1999. p. 61–72.Google Scholar
  2. 2.
    Aggarwal CC, Yu PS. Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the ACM SIGMOD International Conference; 2000. p. 70–81.Google Scholar
  3. 3.
    Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference, Seattle, June 1998. p. 94–105.Google Scholar
  4. 4.
    Böhm C, Kailing K, Kröger P, Zimek A. Computing clusters of correlation connected objects. In: Proceedings of the ACM SIGMOD International Conference; 2004. p. 455–66.Google Scholar
  5. 5.
    Castelli V, Thomasian A, Li CS. CSVD: clustering and singular value decomposition for approximate similarity search in high dimensional spaces. IEEE Trans. Knowl Data Eng. 2003;14(3):671–85.CrossRefGoogle Scholar
  6. 6.
    Chakrabarti K, Mehrotra S. Local dimensionality reduction: a new approach to indexing high dimensional space. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 89–100.Google Scholar
  7. 7.
    Faloutsos C. Searching multimedia databases by content. Advances in database systems. Boston: KAP/Elsevier; 1996.CrossRefzbMATHGoogle Scholar
  8. 8.
    Kim B, Park S. A fast k-nearest-neighbor finding algorithm based on the ordered partition. IEEE Trans. Pattern Anal. Mach. Intell. 1986;8(6):761–66.CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Korn F, Jagadish HV, Faloutsos C. Efficiently supporting ad hoc queries in large datasets of time sequences. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1997. p. 289–300.Google Scholar
  10. 10.
    Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z. Fast and effective retrieval of medical tumor shapes: nearest neighbor search in medical image databases. IEEE Trans Knowl Data Eng. 1998;10(6):889–904.CrossRefGoogle Scholar
  11. 11.
    Kriegel HP, Kröger P, Zimek A. Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data. 2009;3(1): 1–58.CrossRefGoogle Scholar
  12. 12.
    Linde Y, Buzo A, Gray R. An algorithm for vector quantizer design. IEEE Trans Commun. 1980;28(1):84–95.CrossRefGoogle Scholar
  13. 13.
    Ravikanth KV, Agrawal D, Singh A. Dimensionality-reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 166–76.Google Scholar
  14. 14.
    Samet H. Foundations of multidimensional and metric data structure. Amsterdam: Elsevier; 2006.zbMATHGoogle Scholar
  15. 15.
    Thomasian A, Zhang L. The stepwise dimensionality increasing – SDI index for high dimensional data. Comput J. 2006;49(5):609–18.CrossRefGoogle Scholar
  16. 16.
    Thomasian A, Zhang L. Persistent clustered main memory index for accelerating k -NN queries on high dim. datasets. Multimed. Tools Appl. 2008;38(2):253–70.CrossRefGoogle Scholar
  17. 17.
    Thomasian A, Castelli V, Li CS. RCSVD: recursive clustering and singular value decomposition for approximate high-dimensionality indexing. In: Proceedings of the ACM International Conference on Information and Knowledge Management. p. 201–07.Google Scholar
  18. 18.
    Thomasian A, Li Y, Zhang L. Exact k-NN queries on clustered SVD datasets. Inf. Process. Lett. 2005;94(6):247–52.CrossRefMathSciNetzbMATHGoogle Scholar
  19. 19.
    Thomasian A, Li Y, Zhang L. Optimal subspace dimensionality for k-nearest-neighbor queries on clustered and dimensionality reduced datasets with SVD. Multimed. Tools Appl. 2008;40(2):241–59.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Thomasian and AssociatesPleasantvilleUSA