Skip to main content

Classification, Clustering, and Visualisation Based on Dual Scaling

  • Conference paper
  • First Online:

Abstract

In practice, the statistician is often faced with data already available. In addition, there are often mixed data. The statistician must now try to gain optimal statistical conclusions with the most sophisticated methods. But, are the variables scaled optimally? And, what about missing data? Without loss of generality here we restrict to binary classification/clustering. A very simple but general approach is outlined that is applicable to such data for both classification and clustering, based on data preparation (i.e., a down-grading step such as binning for each quantitative variable) followed by dual scaling (the up-grading step: scoring). As a byproduct, the quantitative scores can be used for multivariate visualisation of both data and classes/clusters. For illustrative purposes, a real data application to optical character recognition (OCR) is considered throughout the paper. Moreover, the proposed approach will be compared with other multivariate methods such as the simple Bayesian classifier.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Berry MW, Browne M (eds) (2006) Lecture notes in data mining. World Scientific, Singapore

    MATH  Google Scholar 

  • Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://archive.ics.uci.edu/ml

  • Greenacre MJ (1984) Theory and applications of correspondence analysis. Academic, London

    MATH  Google Scholar 

  • Kauderer H, Mucha HJ (1998) Supervised learning with qualitative and mixed attributes. In: Balderjahn I, Mathar R, Schader M (eds) Classification, data analysis, and data highways. Springer, Berlin, pp 374–382

    Chapter  Google Scholar 

  • Mucha HJ (2002) An intelligent clustering technique based on dual scaling. In: Nishisato S, Baba Y, Bozdogan H, Kanefuji K (eds) Measurement and multivariate analysis. Springer, Tokyo, pp 37–46

    Chapter  Google Scholar 

  • Mucha HJ (2009) ClusCorr98 for Excel 2007: clustering, multivariate visualization, and validation. In: Mucha HJ, Ritter G (eds) Classification and clustering: models, software and applications. Report 26, WIAS, Berlin, pp 14–40

    Google Scholar 

  • Mucha HJ, Siegmund-Schultze R, Dübon K (1998) Adaptive cluster analysis techniques – software and applications. In: Hayashi C, Ohsumi N, Yajima K, Tanaka Y, Bock HH, Baba Y (eds) Data science, classification and related methods. Springer, Tokyo, pp 231–238

    Chapter  Google Scholar 

  • Nishisato S (1980) Analysis of categorical data: dual scaling and its applications. University of Toronto Press, Toronto

    MATH  Google Scholar 

  • Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  • Parvez MT, Mahmoud SA (2013) Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognit 46(1):141–154

    Article  Google Scholar 

  • Pölz W (1995) Optimal scaling for ordered categories. Comput Stat 10:37–41

    MATH  Google Scholar 

  • Pölz W (1996) Überprüfung und Erhöhung der Diskriminierfähigkeit von Skalen. In: Mucha HJ, Bock HH (eds) Classification and multivariate graphics: models, software and applications. Report 10, WIAS, Berlin, pp 51–55

    Google Scholar 

  • Vamvakas G, Gatos B, Perantonis SJ (2010) Handwritten character recognition through two-stage foreground sub-sampling. Pattern Recognit 43(8):2807–2816

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans-Joachim Mucha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Mucha, HJ. (2014). Classification, Clustering, and Visualisation Based on Dual Scaling. In: Gaul, W., Geyer-Schulz, A., Baba, Y., Okada, A. (eds) German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01264-3_5

Download citation

Publish with us

Policies and ethics