Data Mining in Agriculture pp 23-45 | Cite as

# Statistical Based Approaches

## Abstract

Principal component analysis (PCA) is a method used to reduce the dimension of a given set of data while retaining the variability present in the set. Each set of data contains information represented through vectors of single variables (that usually have real, integer or binary values). For instance, a geometric point in the threedimensional space can be represented through a vector having three variables, each one associated to one of the three coordinate axes *x*, *y* and *z*. In general, a sample can be represented by a vector formed by a certain number of variables. Such number of variables defines the length of the vectors contained in the set, and hence the dimension of the set. Moreover, for each variable, a certain range of variability can be defined, which determines the interval of values that the single variable can take. For instance, if the set of data contains three-dimensional points delimited into a cube having side 1 and centered in *(*0*,* 0*,* 0*)*, then the three variables representing the Cartesian coordinates are bounded to have values in \(\left[-\frac{1}{2}, \frac{1}{2}\right]\). This interval defines the range of variability of the three variables. The aim of PCA is to find hidden patterns amongst the data and transform the original data in such a way that emphasizes their similarities and differences. Once the patterns are found, the data can be represented as components ordered by their relevance and it is possible then to discard components of low level of relevance without loss of important information.

## Keywords

Principal Component Analysis Covariance Matrix Regression Function Meat Quality Principal Component Analysis Method## Preview

Unable to display preview. Download preview PDF.