A typical data set can be represented as a collection of n vectors x = (x 1,⃛,x p ) each of length p. They are usually modeled as IID outcomes of a single random variable X = (X 1,⃛,X p ) . Classical data sets had small values of p and small to medium values of n, with p<n. Currently emerging data sets are much more complicated and diverse: The sample size may be so large that a mean cannot be calculated in real time. The dimension p may be so large that no realistic sample size will ever be obtained. The X may summarize a waveform, a graph with many edges and vertices, an image, or a document. Often data sets are multitype, meaning they combine qualitatively different classes of data. In all these cases, and many others, the complexity of the data – to say nothing of the model – is so great that inference becomes effectively impossible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag New York
About this chapter
Cite this chapter
Clarke, B., Fokoué, E., Zhang, H.H. (2009). Learning in High Dimensions. In: Principles and Theory for Data Mining and Machine Learning. Springer Series in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-98135-2_9
Download citation
DOI: https://doi.org/10.1007/978-0-387-98135-2_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-98134-5
Online ISBN: 978-0-387-98135-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)