Abstract
A universal method of dimension and sample size reduction, designed for exploratory data analysis procedures, constitutes the subject of this paper. The dimension is reduced by applying linear transformation, with the requirement that it has the least possible influence on the respective locations of sample elements. For this purpose an original version of the heuristic Parallel Fast Simulated Annealing method was used. In addition, those elements which change the location significantly as a result of the transformation, may be eliminated or assigned smaller weights for further analysis. As well as reducing the sample size, this also improves the quality of the applied methodology of knowledge extraction. Experimental research confirmed the usefulness of the procedure worked out in a broad range of problems of exploratory data analysis such as clustering, classification, identification of outliers and others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Particular coordinates of a random variable of course constitute one-dimensional random variables and if the probabilistic aspects are not the subject of research, then in data analysis these variables are given the terms âfeatureâ or âattributeâ.
References
Gendreau, M., Potvin, J.-Y.: Handbook of Metaheuristics. Springer, New York (2010)
Francois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19, 873â886 (2007)
Xu, R., Wunsch, D.C.: Clustering. Wiley, New Jersey (2009)
van der Maaten, L.J.P.: Feature extraction from visual data. Ph.D. thesis, Tilburg University (2009)
Bartenhagen, C., Klein, H.-U., Ruckert, C., Jiang, X., Dugas, M.: Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics, 11, paper no 567 (2010)
Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29, 40â51 (2007)
Rodriguez-Martinez, E., Goulermas, J.Y., Tingting, M., Ralph, J.F.: Automatic induction of projection pursuit indices. IEEE Trans. Neural Netw. 21, 1281â1295 (2010)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)
Kulczycki, P., Kowalski, P.A.: Bayes classification of imprecise information of interval type. Control Cybern. 40, 101â123 (2011)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257â286 (2000)
Kulczycki, P.: Estymatory jadrowe w analizie systemowej. WNT, Warsaw (2005)
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman and Hall, London (1995)
Deng, Z., Chung, F.-L., Wang, S.: FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recogn. 41, 1363â1372 (2008)
Saxena, A., Pal, N.R., Vora, M.: Evolutionary methods for unsupervised feature selection using Sammonâs stress function. Fuzzy Inform. Eng. 2, 229â247 (2010)
Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18, 401â409 (1969)
Strickert, M., Teichmann, S., Sreenivasulu, N., Seiffert, U.: âDIPPPâ online self-improving linear map for distance-preserving data analysis. 5th Workshop on Self-Organizing Maps (WSOM), Paris, 5â8 September 2005, pp. 661â668 (2005)
Vanstrum, M.D., Starks, S.A.: An algorithm for optimal linear maps. Southeastcon Conference, Huntsville, 5â8 April 1981, pp. 106â110 (1981)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman and Hall, Boca Raton (2000)
Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. Chapman and Hall, Boca Raton (2004)
Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 734â747 (2002)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and the Bayesian restoration in images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721â741 (1984)
Szu, H., Hartley, R.: Fast simulated annealing. Phys. Lett. A 122, 157â162 (1987)
Ingber, L.: Adaptive simulated annealing (ASA): lessons learned. Control Cybern. 25, 33â54 (1996)
Nam, D., Lee, J.-S., Park, C.H.: N-dimensional Cauchy neighbor generation for the fast simulated annealing. IEICE Trans. Inf. Syst. E87âD, 2499â2502 (2004)
Aarts, E.H.L., Korst, J.H.M., van Laarhoven, P.J.M.: Simulated annealing. In: Aarts, E.H.L., Lenstra, J.K. (eds.) Local Search in Combinatorial Optimization. Wiley, Chichester (1997)
Ben-Ameur, W.: Computing the initial temperature of simulated annealing. Comput. Optim. Appl. 29, 367â383 (2004)
Kuo, Y.: Using simulated annealing to minimize fuel consumption for the time-dependent vehicle routing problem. Comput. Ind. Eng. 59, 157â165 (2010)
Mesgarpour, M., Kirkavak, N., Ozaktas, H.: Bicriteria scheduling problem on the two-machine flowshop using simulated annealing. Lect. Notes Comput. Sci. 6022, 166â177 (2010)
Sait, S.M., Youssef, H.: Iterative Computer Algorithms with Applications in Engineering: Solving Combinatorial Optimization Problems. IEEE Computer Society Press, Los Alamitos (2000)
Bartkute, V., Sakalauskas, L.: Statistical inferences for termination of Markov type random search algorithms. J. Optim. Theory Appl. 141, 475â493 (2009)
David, H.A., Nagaraja, H.N.: Order Statistics. Wiley, New York (2003)
Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer-Verlag, Berlin (2008)
Azencott, R.: Simulated Annealing: Parallelization Techniques. Wiley, New York (1992)
Alba, E.: Parallel Metaheuristics: A New Class of Algorithms. Wiley, New York (2005)
Kendall, M.G., Stuart, A.: The Advanced Theory of Statistics; Vol. 2: Inference and Relationship. Griffin, London (1973)
Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling. Theory and Applications. Springer-Verlag, Berlin (2005)
Camastra, F.: Data dimensionality estimation methods: a survey. Pattern Recogn. 36, 2945â2954 (2003)
Kerdprasop, K., Kerdprasop, N., Sattayatham, P.: Weighted k-means for density-biased clustering. Lect. Notes Comput. Sci. 3589, 488â497 (2005)
Parvin, H., Alizadeh, H., Minati, B.: A modification on k-nearest neighbor classifier. Glob. J. Comput. Sci. Technol. 10, 37â41 (2010)
Ĺukasik, S.: Algorytm redukcji wymiaru i licznosci proby dla celow procedur eksploracyjnej analizy danych. Ph.D. thesis, Systems Research Institute, Polish Academy of Sciences (2012)
Kulczycki, P., Ĺukasik, S.: An algorithm for reducing dimension and size of sample for data exploration procedures. In press (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Š 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kulczycki, P., Ĺukasik, S. (2014). Reduction of Dimension and Size of Data Set by Parallel Fast Simulated Annealing. In: KĂłczy, L., Pozna, C., Kacprzyk, J. (eds) Issues and Challenges of Intelligent Systems and Computational Intelligence. Studies in Computational Intelligence, vol 530. Springer, Cham. https://doi.org/10.1007/978-3-319-03206-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-03206-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03205-4
Online ISBN: 978-3-319-03206-1
eBook Packages: EngineeringEngineering (R0)