Abstract
In practice, the statistician is often faced with data already available. In addition, there are often mixed data. The statistician must now try to gain optimal statistical conclusions with the most sophisticated methods. But, are the variables scaled optimally? And, what about missing data? Without loss of generality here we restrict to binary classification/clustering. A very simple but general approach is outlined that is applicable to such data for both classification and clustering, based on data preparation (i.e., a down-grading step such as binning for each quantitative variable) followed by dual scaling (the up-grading step: scoring). As a byproduct, the quantitative scores can be used for multivariate visualisation of both data and classes/clusters. For illustrative purposes, a real data application to optical character recognition (OCR) is considered throughout the paper. Moreover, the proposed approach will be compared with other multivariate methods such as the simple Bayesian classifier.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Berry MW, Browne M (eds) (2006) Lecture notes in data mining. World Scientific, Singapore
Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml
Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic, London
Kauderer H, Mucha HJ (1998) Supervised learning with qualitative and mixed attributes. In: Balderjahn I, Mathar R, Schader M (eds) Classification, data analysis, and data highways. Springer, Berlin, pp 374–382
Mucha HJ (2002) An intelligent clustering technique based on dual scaling. In: Nishisato S, Baba Y, Bozdogan H, Kanefuji K (eds) Measurement and multivariate analysis. Springer, Tokyo, pp 37–46
Mucha HJ (2009) ClusCorr98 for Excel 2007: clustering, multivariate visualization, and validation. In: Mucha HJ, Ritter G (eds) Classification and clustering: models, software and applications. Report 26, WIAS, Berlin, pp 14–40
Mucha HJ, Siegmund-Schultze R, Dübon K (1998) Adaptive cluster analysis techniques – software and applications. In: Hayashi C, Ohsumi N, Yajima K, Tanaka Y, Bock HH, Baba Y (eds) Data science, classification and related methods. Springer, Tokyo, pp 231–238
Nishisato S (1980) Analysis of categorical data: dual scaling and its applications. University of Toronto Press, Toronto
Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum Associates, Hillsdale
Parvez MT, Mahmoud SA (2013) Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognit 46(1):141–154
Pölz W (1995) Optimal scaling for ordered categories. Comput Stat 10:37–41
Pölz W (1996) Überprüfung und Erhöhung der Diskriminierfähigkeit von Skalen. In: Mucha HJ, Bock HH (eds) Classification and multivariate graphics: models, software and applications. Report 10, WIAS, Berlin, pp 51–55
Vamvakas G, Gatos B, Perantonis SJ (2010) Handwritten character recognition through two-stage foreground sub-sampling. Pattern Recognit 43(8):2807–2816
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mucha, HJ. (2014). Classification, Clustering, and Visualisation Based on Dual Scaling. In: Gaul, W., Geyer-Schulz, A., Baba, Y., Okada, A. (eds) German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01264-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-01264-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01263-6
Online ISBN: 978-3-319-01264-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)