Abstract
This paper proposes and considers a new distribution-free technique based on the work of Atilgan and Bozdogan (1990, 1992) to explore multivariate modality of a given multivariate data set using an unsupervised mixture of cubic B-Spline in density estimation in one dimensional spece. In the multivariate case, this is achieved by utilizing the Mahalanobis (1936) distance of each point from the multivariate mean (centroid), Mahalanobis distance data depth (MDD), jackknife Mahalanobis distance data depth (JMDD), principal components (PC) transformation, and common PC transformation. Analysis is carried out under the assumption that we do not know a priori of the classification or the grouping of the data. These dimension reduction techniques result in a better description of the shape of the underlying distribution which has many attractive properties. The EM algorithm is developed for the mixture of cubic B-splines in an interactive symbolic computational environment to obtain the maximum likelihood estimators of the parameters and to evaluate model selection criteria such as AIC (Akaike, 1973), CAIC (Bozdogan, 1987), and ICOMP (Bozdogan 1988,1990, 1993, 1994) in objectively detecting the modality of the data and in density estimation to determine the most parsimonious fit. A real numerical example is shown with multivariate data set with a known number of mixture clusters and configurations to illustrate the versatility and efficiency of the proposed approach in detecting the multimodality and the density concentration points (clusters) of multivariate data with a high degree of accuracy in univariate space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AKAIKE, H. (1973): Information theory and an extension of the maximum likelihood principle, In: Second International Symposium on Information Theory, Petrov, B.N. and Csaki, F. (eds.), 267–281, Akademiai Kiado, Budapest.
ANDERSON, E. (1935): The irises of the Gaspe Peninsula,Bulletin of the American Iris Society, 9, 59, 2–5.
ATILGAN, T. and BOZDOGAN, H. (1990): Selecting the number of knots in fitting cardinal B-splines for density estimation using AIC, Journal of Japan Statistical Society, 9, 179–190.
ATILGAN, T. and BOZDOGAN, H. (1992): Convergence properties of MLE’s and asymptotic simultaneous confidence intervals in fitting cardinal B-splines for density estimation, Statistics and Probability Letters, 9, 13, 89–98.
BIRKHOFF, G. and de BOOR, C. (1965): Piecewise polynomial interpolation and approximation, In: Approximation of Functions, Garabedian, H.L. (ed.), 164–190, Elsevier Publishing Company, Amsterdam.
BOCK, H.H. (1987): On the interface between cluster analysis, principal component analysis, and multidimensional scaling, In: Multivariate Statistical Modeling and Data Analysis, Bozdogan, H. and Gupta, A.K. (eds.), 17–34, D. Reidel Publishing Company, Dordrecht.
BOZDOGAN, H. (1987): Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions, Psychometrika,9 52, No. 3, 345–370. Special Section (invited paper).
BOZDOGAN, H. (1988): ICOMP: A new model selection criterion, In: Classification and related methods of data analysis, Bock, H.H. (ed.), 599–608, North-Holland, Amsterdam.
BOZDOGAN, H. (1990): On the information-based measure of covariance complexity and its application on the evaluation of multivariate linear models. Communications in Statistics, Theory and Methods, 9 19 (1), 221–278.
BOZDOGAN, H. (1993): Choosing the number of component clusters in the mixture model using a new informational complexity criterion of the inverse Fisher information matrix, In: Studies in Classification, Data Analysis, and Knowledge Organization, Opitz, O. et al. (eds.), 40–54, Springer-Verlag, Heidelberg.
BOZDOGAN, H. (1994): Mixture-model cluster analysis using a new informational complexity and model selection criteria, In: Multivariate Statistical Modeling, Vol. 2, Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach, Bozdogan, H. (ed.), 69–113, Kluwer Academic Publishers, Dordrecht.
BOZDOGAN, H. and HAUGHTON, D.(1996): Informational complexity criteria for regression models, Submitted to Communications in Statistics, Theory and Methods.
CURRY, H.B. and SCHOENBERG, I.J. (1966): On Polya. Frequency functions. IV. The Fundamental spline functions and their limits, J. Analyse. Math.9, 17, 71–107.
DEMPSTER, A.P., et al. (1977): Maximum likelihood from incomplete data via the EM algorithm, Journal of Royal Statistical Society, Series B, 9, 39, 1–38.
FISHER, R.A. (1936): The use of multiple measurements in taxonomic problems, Annals of Eugenics, 9, 7, 179–188.
FLURY, B. (1984): Common principal components in k groups, J. of the American Statistical Association, 9, 79, 892–898.
FLURY, B. (1988): Common principal components and related multivariate models, John Wiley and Sons, New York.
LIU, R.Y. (1995): Control charts for multivariate processes, J. of the American Statistical Association, 9, 90, 1380–1387.
MAHALANOBIS, P.C. (1936): On the generalized distance in statistics, Proc. Nat. Inst. Sci. India, 9, 12, 49–55.
POSKITT, D.S. (1987): Precision, complexity and Bayesian model determination, Journal of Royal Statistical Society, Series B, 9, 49, 199–208.
SCHUMAKER, L.L. (1969): Approximation by splines, In: Theory and Applications of Spline Functions, Greville, T.N.E. (ed.), 65–85, Academic Press, New York.
SCLOVE, S.L. (1987): Metric considerations in clustering:implications for algorithms, In: Multivariate Statistical Modeling and Data AnalysisBozdogan, H. and Gupta, A.K. (eds.), 163–186, D. Reidel Publishing Company, Dordrecht.S
SCOTT, D.W. (1992): Multivariate density estimation, John Wiley and Sons, New York.
WU, C.F. (1983): On the convergence of the EM algorithm, Annals of Statistics, 9, 11, 95–103.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin · Heidelberg
About this chapter
Cite this chapter
Bozdogan, H. (2000). Exploring Multivariate Modality by Unsupervised Mixture of Cubic B-Splines in 1-D Using Model Selection Criteria. In: Gaul, W., Opitz, O., Schader, M. (eds) Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58250-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-58250-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67731-4
Online ISBN: 978-3-642-58250-9
eBook Packages: Springer Book Archive