Dimension Reduction

Larsen, Kasper Green

doi:10.1007/978-3-319-63962-8_60-1

Dimension Reduction

Kasper Green Larsen³

Living reference work entry
First Online: 20 April 2018

187 Accesses
1 Citations

Definition

Dimensionality Reduction is a technique for taking a high-dimensional data set (data objects with many features/attributes) and replacing it with a much lower-dimensional data set while still preserving similarities between data objects. Dimensionality reduction is useful for reducing the memory requirements for storing a data set, as well as for speeding up algorithms.

Overview

In modern algorithm design and data analysis, we often face very high-dimensional data. High-dimensional data comes in many forms. As examples, we can think of a 10-megapixel image as a point with ten million coordinates, one for each pixel. In this way, an image becomes a ten million-dimensional point. Another often encountered example arises when we wish to compare documents based on similarity. One way to do this is to simply count how many times each dictionary word occurs in a document and compare documents based on these counts. This yields a representation of a document as a point with one...

This is a preview of subscription content, log in via an institution.

References

Ailon N, Chazelle B (2009) The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SIAM J Comput 39(1):302–322
Article MathSciNet Google Scholar
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):117–122
Article Google Scholar
Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035
Google Scholar
Boutsidis C, Mahoney MW, Drineas P (2009) Unsupervised feature selection for the k-means clustering problem. In: Proceedings of the 22nd international conference on neural information processing systems (NIPS), pp 153–161
Google Scholar
Cohen MB, Elder S, Musco C, Musco C, Persu M (2015) Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the 47th annual ACM symposium on theory of computing (STOC), pp 163–172
Google Scholar
Dahlgaard S, Knudsen MBT, Thorup M (2017, to appear) Practical hash functions for similarity estimation and dimensionality reduction. In: Proceedings of the 31st annual conference on neural information processing systems (NIPS)
Google Scholar
Dasgupta S, Gupta A (2003) An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct Algorithm 22(1):60–65
Article MathSciNet Google Scholar
Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26:189–206
Article MathSciNet Google Scholar
Kane DM, Nelson J (2014) Sparser Johnson–Lindenstrauss transforms. J ACM 61(1):4:1–4:23
Article MathSciNet Google Scholar
Larsen KG, Nelson J (2017, to appear) Optimality of the Johnson-Lindenstrauss lemma. In: Proceedings of the 58th annual symposium on foundations of computer science (FOCS). See http://arxiv.org/abs/1609.02094
Nelson J (2015) Johnson-lindenstrauss notes. http://people.seas.edu/~minilek/madalgo2015/index.html
Nelson J, Nguyen HL (2013) Sparsity lower bounds for dimensionality reducing maps. In: Proceedings of the 45th annual ACM symposium on theory of computing (STOC), pp 101–110
Google Scholar
Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of the 26th annual international conference on machine learning (ICML), pp 1113–1120
Google Scholar

Download references

Author information

Authors and Affiliations

Aarhus University, Aarhus, Denmark
Kasper Green Larsen

Authors

Kasper Green Larsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kasper Green Larsen .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

Department of Computer Science, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Paolo Ferragina

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Larsen, K.G. (2018). Dimension Reduction. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_60-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_60-1
Published: 20 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics