Data Mining Technologies Inspired from Visual Principle
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber’s law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
The advantages of the proposed approaches are: 1) The derived algorithms are computational stable and insensitive to initialization and they are totally free from solving difficult global optimization problems. 2) They facilitate the construction of new checks on DM validity and provide the final DM result a significant degree of robustness to noise in data and change in scale. 3) They are free from model selection in application. 4) The DM results are highly consistent with those perceived by our human eyes. 5) They provide unified frameworks for scale-related DM algorithms recently derived from many other fields such as estimation theory, recurrent signal processing, information theory and statistical mechanics, and artificial neural networks.