High Dimensional Data

  • David Forsyth


We have a dataset that is a collection of d-dimensional vectors. This chapter introduces the nasty tricks that such data can play. A dataset like this is hard to plot, though Sect. 4.1 suggests some tricks that are helpful. Most readers will already know the mean as a summary (it’s an easy generalization of the 1D mean). The covariance matrix may be less familiar. This is a collection of all covariances between pairs of components. We use covariances, rather than correlations, because covariances can be represented in a matrix easily. High dimensional data has some nasty properties (it’s usual to lump these under the name “the curse of dimension”). The data isn’t where you think it is, and this can be a serious nuisance, making it difficult to fit complex probability models.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • David Forsyth
    • 1
  1. 1.Computer Science DepartmentUniversity of Illinois Urbana ChampaignUrbanaUSA

Personalised recommendations