Abstract
In this paper we propose a generalization to symbolic interval valued variables, of the Principal Curves and Surfaces method proposed by Hastie in [6]. Given a data set X with n observations and m continuous variables, the main idea of Principal Curves and Surfaces method is to generalize the principal component line, providing a smooth one-dimensional curved approximation to a set of data points in \(\mathbb {R}^m\). A principal surface is more general, providing a curved manifold approximation of dimension 2 or more. In our case we are interested in finding the main principal curve that approximates better symbolic interval data variables. In [3, 4], authors proposed the Centers Method and the Vertices Method to extend the well-known principal components analysis method to a particular kind of symbolic objects characterized by multi-valued variables of interval type. In this paper we generalize both, the Centers Method and the Vertices Method, finding a smooth curve that passes through the middle of the data X in an orthogonal sense. Some comparisons of the proposed method regarding the Centers and the Vertices Methods are made, this was done with the RSDA package using Ichino data set, see [1, 10]. To make these comparisons we have used the correlation index.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Hoboken (2006)
Bickel, P.J., Doksum, K.A.: Mathematical Statistics. Prentice Hall, Upper Saddle River (1977)
Cazes, P., Chouakria, A., Diday, E., Schektman, Y.: Extension de l’analyse en com-posantes principales á des données de type intervalle. Rev. Statistique Appliquée XLV(3), 5–24 (1997)
Douzal-Chouakria, A., Billard, L., Diday, E.: Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 4(2), 229–246 (2011)
Ichino, M.: General metrics for mixed features - the Cartesian space theory for pattern recognition. In: Conference on Systems, Man, and Cybernetics, pp. 494–497. Pergamon, Oxford (1988)
Hastie, T.: Principal curves and surface. Ph.D. thesis Stanford University (1984)
Hastie, T., Weingessel, A.: Princurve - fits a principal curve in arbitrary dimension (2014). R package version 1.1-12 http://cran.r-project.org/web/packages/princurve/index.html
Hastie, T., Stuetzle, W.: Principal curves. J. Am. Stat. Assoc. 84(406), 502–516 (1989)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York (2008)
Rodríguez, O. with contributions from Olger Calderon, Roberto Zuñiga and Jorge Arce. RSDA - R to Symbolic Data Analysis (2015). R package version 1.3 http://CRAN.R-project.org/package=RSDA
Rodríguez, O.: Classification et Modèles Linéaires en Analyse des Données Symboliques. Ph.D. thesis, Paris IX-Dauphine University (2000)
Diday, E.: Introduction a L’approache Symbolique en Analyse des Données. Premieres Journées Symbolic-Numérique, CEREMADE, Université Paris, pp. 21–56 (1987)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Arce G., J., Rodríguez R., O. (2016). Principal Curves and Surfaces to Interval Valued Variables. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-47955-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47954-5
Online ISBN: 978-3-319-47955-2
eBook Packages: Computer ScienceComputer Science (R0)