Tools for Identification of Structure in Data
In recent years, with our increased ability to collect and store data, have come enormous datasets. These datasets may consist of billions of observations and millions of variables. Some of the classical methods of statistical inference, in which a parametric model is studied, are neither feasible nor relevant for analysis of these datasets. The objective is to identify interesting structures in the data, such as clusters of observations, or relationships among the variables. Sometimes, the structures allow a reduction in the dimensionality of the data. Many of the classical methods of multivariate analysis, such as principal components analysis, factor analysis, canonical correlations analysis, and multidimensional scaling, are useful in identifying interesting structures. These methods generally attempt to combine variables in such a way as to preserve information yet reduce the dimension of the dataset. Dimension reduction generally carries a loss of some information. Whether the lost information is important is the major concern in dimension reduction. Another set of methods for reducing the complexity of a dataset attempts to group observations together, combining observations, as it were.
KeywordsDistance Measure Data Matrix Canonical Correlation Analysis Orthogonal Transformation Dissimilarity Measure
Unable to display preview. Download preview PDF.