Learning in High Dimensions

  • Bertrand Clarke
  • Ernest Fokoué
  • Hao Helen Zhang
Part of the Springer Series in Statistics book series (SSS)

A typical data set can be represented as a collection of n vectors x = (x 1,⃛,x p ) each of length p. They are usually modeled as IID outcomes of a single random variable X = (X 1,⃛,X p ) . Classical data sets had small values of p and small to medium values of n, with p<n. Currently emerging data sets are much more complicated and diverse: The sample size may be so large that a mean cannot be calculated in real time. The dimension p may be so large that no realistic sample size will ever be obtained. The X may summarize a waveform, a graph with many edges and vertices, an image, or a document. Often data sets are multitype, meaning they combine qualitatively different classes of data. In all these cases, and many others, the complexity of the data – to say nothing of the model – is so great that inference becomes effectively impossible.


Partial Little Square Independent Component Analysis Principal Curve Canonical Correlation Analysis Independent Component Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag New York 2009

Authors and Affiliations

  • Bertrand Clarke
    • 1
  • Ernest Fokoué
    • 2
  • Hao Helen Zhang
    • 3
  1. 1.University of MiamiMiamiCanada
  2. 2.Department of Science & MathematicsKettering UniversityFlintUSA
  3. 3.Department of StatisticsNorth Carolina State University Program in Statistical GeneticsRaleighUSA

Personalised recommendations