Geometry of Data Sets
Entity-to-variable data table can be represented geometrically in three different settings of which one (row-points) pertains to conventional clustering, another (column-vectors), to conceptual clustering, and the third one (matrix space), to approximation clustering.
Two principles for standardizing the conditional data tables are suggested as related to the data scatter.
Standardizing the aggregable data is suggested based on the flow index concept introduced.
Graph-theoretic concepts related to clustering are considered.
Low-rank approximation of data, including the popular Principal component and Correspondence analysis techniques, are discussed and extended into a general Sequential fitting procedure, SEFIT, which will be employed for approximation clustering.
KeywordsSpan Tree Singular Value Decomposition Correspondence Analysis Maximum Clique Boolean Variable
Unable to display preview. Download preview PDF.