Dimensionality Reduction is a technique for taking a high-dimensional data set (data objects with many features/attributes) and replacing it with a much lower-dimensional data set while still preserving similarities between data objects. Dimensionality reduction is useful for reducing the memory requirements for storing a data set, as well as for speeding up algorithms.
In modern algorithm design and data analysis, we often face very high-dimensional data. High-dimensional data comes in many forms. As examples, we can think of a 10-megapixel image as a point with ten million coordinates, one for each pixel. In this way, an image becomes a ten million-dimensional point. Another often encountered example arises when we wish to compare documents based on similarity. One way to do this is to simply count how many times each dictionary word occurs in a document and compare documents based on these counts. This yields a representation of a document as a point with one...
- Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035Google Scholar
- Boutsidis C, Mahoney MW, Drineas P (2009) Unsupervised feature selection for the k-means clustering problem. In: Proceedings of the 22nd international conference on neural information processing systems (NIPS), pp 153–161Google Scholar
- Cohen MB, Elder S, Musco C, Musco C, Persu M (2015) Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the 47th annual ACM symposium on theory of computing (STOC), pp 163–172Google Scholar
- Dahlgaard S, Knudsen MBT, Thorup M (2017, to appear) Practical hash functions for similarity estimation and dimensionality reduction. In: Proceedings of the 31st annual conference on neural information processing systems (NIPS)Google Scholar
- Larsen KG, Nelson J (2017, to appear) Optimality of the Johnson-Lindenstrauss lemma. In: Proceedings of the 58th annual symposium on foundations of computer science (FOCS). See http://arxiv.org/abs/1609.02094
- Nelson J (2015) Johnson-lindenstrauss notes. http://people.seas.edu/~minilek/madalgo2015/index.html
- Nelson J, Nguyen HL (2013) Sparsity lower bounds for dimensionality reducing maps. In: Proceedings of the 45th annual ACM symposium on theory of computing (STOC), pp 101–110Google Scholar
- Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of the 26th annual international conference on machine learning (ICML), pp 1113–1120Google Scholar