Learning in High Dimensions
A typical data set can be represented as a collection of n vectors x = (x 1,⃛,x p ) each of length p. They are usually modeled as IID outcomes of a single random variable X = (X 1,⃛,X p ) . Classical data sets had small values of p and small to medium values of n, with p<n. Currently emerging data sets are much more complicated and diverse: The sample size may be so large that a mean cannot be calculated in real time. The dimension p may be so large that no realistic sample size will ever be obtained. The X may summarize a waveform, a graph with many edges and vertices, an image, or a document. Often data sets are multitype, meaning they combine qualitatively different classes of data. In all these cases, and many others, the complexity of the data – to say nothing of the model – is so great that inference becomes effectively impossible.
KeywordsPartial Little Square Independent Component Analysis Principal Curve Canonical Correlation Analysis Independent Component Analysis
Unable to display preview. Download preview PDF.