Visualizing Univariate Distributions
Visualizing the distribution of a single continuous variable is a common graphical task for which several specialized methods have evolved. The distribution of a random variable X is defined by the corresponding cumulative distribution function (CDF) F(x) = P(X ≤). For continuous random variables, or more precisely, random variables with an absolutely continuous CDF, an equivalent representation is the density f(x) = F'(x). One is often also interested in the inverse of F, the quantile function. R provides these functions for many standard distributions; for example, pnorm(), dnorm(), and qnorm() give the distribution, density, and quantile functions, respectively, for the normal distribution. Most of the visualization methods discussed in this chapter involve estimating these functions from data. In particular, density plots and histograms display estimates of the density f, and quantile plots and box-and-whisker plots are based on (partial) estimates of F or its inverse.
Although the mathematical relationships between the theoretical constructs are well-defined, there are no natural relationships between their standard estimates. Furthermore, the task of visualization comes with its own special rules; two plots with exactly the same information can put visual emphasis on entirely different aspects of that information. Thus, the appropriateness of a particular visualization depends to a large extent on the purpose of the analysis. We discuss the merits of different visualizations as we encounter them, but it is helpful to keep this background in mind when reading about them.
KeywordsKernel Density Estimate Density Plot Quantile Function Univariate Distribution Panel Function
Unable to display preview. Download preview PDF.