Skip to main content

A New Framework of Smoothed Location Model with Multiple Correspondence Analysis

  • Conference paper
  • First Online:
Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015)
  • 375 Accesses

Abstract

The implication of a considering large binary variables into the smoothed location model will create too many multinomial cells or lead to high multinomial cells and more worrying is that it will cause most of them are empty. We refer this situation as large sparsity problem. When large sparsity of multinomial cells occurs, the smoothed estimators of location model will be greatly biased, hence creating frustrating performance. At worst, the classification rules cannot be constructed. This issue has attracted this paper to further investigate and propose a new approach of the smoothed location model when facing with large sparsity problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krzanowski, W.J.: Mixtures of continuous and categorical variables in discriminant analysis. Biometrics 36, 493–499 (1980)

    Article  Google Scholar 

  2. Wernecke, K.D.: A coupling procedure for the discrimination of mixed data. Biometrics 48(2), 497–506 (1992)

    Article  Google Scholar 

  3. Krzanowski, W.J.: Discrimination and classification using both binary and continuous variables. J. Am. Stat. Assoc. 70(352), 782–790 (1975)

    Article  Google Scholar 

  4. Krzanowski, W.J.: The location model for mixtures of categorical and continuous variables. J. Classif. 10, 25–49 (1993)

    Article  Google Scholar 

  5. Hand, D.J.: Construction and Assessment of Classification Rules: Wiley Series in Probability and Statistics. Wiley, Chichester (1997)

    Google Scholar 

  6. Xu, L., Krzyżak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)

    Article  Google Scholar 

  7. Olkin, I., Tate, R.F.: Multivariate correlation models with mixed discrete and continuous variables. Ann. Math. Stat. 32(2), 448–465 (1961)

    Article  Google Scholar 

  8. Mahat, N.I., Krzanowski, W.J., Hernandez, A.: Variable selection in discriminant analysis based on the location model for mixed variables. Adv. Data Anal. Classif. 1(2), 105–122 (2007)

    Article  Google Scholar 

  9. Mahat, N.I., Krzanowski, W.J., Hernandez, A.: Strategies for non-parametric smoothing of the location model in mixed-variable discriminant analysis. Mod. Appl. Sci. 3(1), 151–163 (2009)

    Google Scholar 

  10. Hamid, H.: A new approach for classifying large number of mixed variables. In: International Conference on Computer and Applied Mathematics, pp. 156–161. World Academy of Science, Engineering and Technology (WASET), France (2010)

    Google Scholar 

  11. Leon, A.R., Soo, A., Williamson, T.: Classification with discrete and continuous variables via general mixed-data models. J. Appl. Stat. 38(5), 1021–1032 (2011)

    Article  Google Scholar 

  12. Hamid, H., Mahat, N.I.: Using principal component analysis to extract mixed variables for smoothed location model. Far East J. Math. Sci. (FJMS) 80(1), 33–54 (2013)

    Google Scholar 

  13. Asparoukhov, O., Krzanowski, W.J.: Non-parametric smoothing of the location model in mixed variable discrimination. Stat. Comput. 10(4), 289–297 (2000)

    Article  Google Scholar 

  14. Vlachonikolis, I.G., Marriott, F.H.C.: Discrimination with mixed binary and continuous data. Appl. Stat. 31(1), 23–31 (1982)

    Article  Google Scholar 

  15. Krzanowski, W.J.: Stepwise location model choice in mixed-variable discrimination. Appl. Stat. 32(3), 260–266 (1983)

    Article  Google Scholar 

  16. Chang, P.C., Afifi, A.A.: Classification based on dichotomous and continuous variables. J. Am. Stat. Assoc. 69(346), 336–339 (1974)

    Article  Google Scholar 

  17. Moussa, M.A.: Discrimination and allocation using a mixture of discrete and continuous variables with some empty states. Comput. Programs Biomed. 12(2–3), 161–171 (1980)

    Article  Google Scholar 

  18. Aitchison, J., Aitken, C.G.G.: Multivariate binary discrimination by Kernel method. Biometrika 63, 413–420 (1976)

    Article  Google Scholar 

  19. Hall, P.: Optimal near neighbour estimator for use in discriminant analysis. Biometrika 68(2), 572–575 (1981)

    Article  Google Scholar 

  20. Wang, X., Tang, X.: Experimental study on multiple LDA classifier combination for high dimensional data classification. In: Roli, F., Kittler, J., Windeatt, T. (eds.) Proceedings of the 5th International Workshop on Multiple Classifier Systems, 9–11 June 2004, Cagliari, Italy, pp. 344–353. Springer, Heidelberg (2004)

    Google Scholar 

  21. Lukibisi, F.B., Lanyasunya, T.: Using principal component analysis to analyze mineral composition data. In: 12th Biennial KARI (Kenya Agricultural Research Institute) Scientific Conference on Socio Economics and Biometrics, pp. 1258–1268. Kenya Agricultural Research Institute, Kenya (2010)

    Google Scholar 

  22. Yu, H., Yang, J.: A Direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn. 34(10), 2067–2070 (2001)

    Article  Google Scholar 

  23. Das, K., Osechinskiy, S., Nenadic, Z.: A Classwise PCA-based recognition of neural data for brain-computer interfaces. In: Proceedings of the 29th IEEE Annual International Conference of Engineering in Medicine and Biology Society, pp. 6519–6522. IEEE Press, France (2007)

    Google Scholar 

  24. Katz, M.H.: Multivariate Analysis : A Practical Guide for Clinicians, 2nd edn. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  25. Li, Q.: An integrated framework of feature selection and extraction for appearance-based recognition. Unpublished doctoral dissertation, University of Delaware Newark, USA (2006)

    Google Scholar 

  26. Ping, H.: Classification methods and applications to mass spectral data. Unpublished doctoral dissertation, Hong Kong Baptist University, Hong Kong (2005)

    Google Scholar 

  27. Young, P.D.: Dimension reduction and missing data in statistical discrimination. Doctoral dissertation, Baylor University, USA (2009)

    Google Scholar 

  28. Zhu, M.: Feature extraction and dimension reduction with applications to classification and analysis of co-occurrence data. Doctoral dissertation, Stanford University (2001)

    Google Scholar 

  29. LouisMarie, A.: Analysis of Multidimensional Poverty : Theory and Case Studies. Springer, New York (2009)

    Google Scholar 

  30. Guttman, L.: The quantification of a class of attributes : a theory and method of scale construction. In: Horst, P., Wallin, P., Guttman, L. (eds.) The Prediction of Personal Adjustment, pp. 319–348. Social Science Research Council, New York, NY (1941)

    Google Scholar 

  31. de Leeuw, J.: Here’s looking at multi-variables. In: Blasius, J., Greenacre, M.J. (eds.) Visualization of Categorical Data, pp. 1–11. Academic Press, San Diego (1998)

    Chapter  Google Scholar 

  32. Meulman, J.J., van Der Kooij, A.J., Heiser, W.J.: Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In: Kaplan, D. (ed.) The SAGE Handbook of Quantitative Methodology for the Social Sciences, pp. 49–70. Sage, Thousand Oaks (2004)

    Google Scholar 

  33. van Buuren, S., de Leeuw, J.: Equality constraints in multiple correspondenc analysis. Multivar. Behav. Res. 27(4), 567–583 (1992)

    Article  Google Scholar 

  34. Tenenhaus, M., Young, F.W.: An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data. Psychometrika 50(1), 91–119 (1985)

    Article  Google Scholar 

  35. Benzécri, J.P.: L’analyse des Données : l’analyse des Correspondances [Data Analysis : Correspndence Analysis]. Dunod, Paris (1973)

    Google Scholar 

  36. Nishisato, S.: Analysis of Categorical Data : Dual Scaling and Its Applications. University of Toronto Press, Toronto (1980)

    Google Scholar 

  37. Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1984)

    Google Scholar 

  38. Lebart, L., Morineau, A., Warwick, K.M.: Multivariate Descriptive Statistical Analysis : Correspondence Analysis and Related Techniques for Large Matrices. Wiley, New York (1984)

    Google Scholar 

  39. Gifi, A.: Nonlinear Multivariate Analysis. Wiley, Chichester (1990)

    Google Scholar 

  40. D’Enza, A.I., Greenacre, M.J.: Multiple correspondence analysis for the quantification and visualization of large categorical data sets. In: Di Ciaccio, A., Coli, M., Ibaňez, J.M.A. (eds.) Advanced Statistical Methods for the Analysis of Large Data-Sets : Studies in Theoretical and Applied Statistics, pp. 453–463. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  41. Glynn, D.: Correspondence analysis: exploring data and identifying patterns. In: Glynn, D., Robinson, J. (eds.) Polysemy and Synonymy: Corpus Methods and Applications in Cognitive Linguistics, pp. 133–179. John Benjamins, Amsterdam (2012)

    Google Scholar 

  42. Messaoud, R.B., Boussaid, O., Rabaséda, S.L.: A multiple correspondence analysis to organize data cubes. In: Vasilecas, O., Eder, J., Caplinskas, A. (eds.) Databases and Information Systems IV : Frontiers in Artificial Intelligence and Applications, pp. 133–146. IOS Press, Amsterdam (2007)

    Google Scholar 

  43. Hoffman, D.L., Franke, G.R.: Corresponding analysis: graphical representation of categorical data in marketing research. J. Mark. Res. 23(3), 213–227 (1986)

    Article  Google Scholar 

  44. Beh, E.J.: Simple correspondence analysis: a bibliographic review. Int. Stat. Rev. 72(2), 257–284 (2004)

    Article  Google Scholar 

  45. Hwang, H., Tomiuk, M.A., Takane, Y.: Correspondence analysis, multiple correspondence analysis and recent developments. In: Millsap, R.E., Maydeu-Olivares, A. (eds.) The SAGE Handbook of Quantitative Methods in Psychology, pp. 243–263. Sage, Thousand Oaks (2009)

    Chapter  Google Scholar 

Download references

Acknowledgment

Author would like to thank to Universiti Utara Malaysia, Malaysia for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hashibah binti Hamid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Hamid, H.b. (2017). A New Framework of Smoothed Location Model with Multiple Correspondence Analysis. In: Ahmad, AR., Kor, L., Ahmad, I., Idrus, Z. (eds) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015). Springer, Singapore. https://doi.org/10.1007/978-981-10-2772-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2772-7_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2770-3

  • Online ISBN: 978-981-10-2772-7

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics