Distances in Classification

Weihs, Claus; Szepannek, Gero

doi:10.1007/978-3-642-03067-3_1

Claus Weihs²⁰ &
Gero Szepannek²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5633))

Included in the following conference series:

Industrial Conference on Data Mining

1757 Accesses
3 Citations

Abstract

The notion of distance is the most important basis for classification. This is especially true for unsupervised learning, i.e. clustering, since there is no validation mechanism by means of objects of known groups. But also for supervised learning standard distances often do not lead to appropriate results. For every individual problem the adequate distance is to be decided upon. This is demonstrated by means of three practical examples from very different application areas, namely social science, music science, and production economics. In social science, clustering is applied to spatial regions with very irregular borders. Then adequate spatial distances may have to be taken into account for clustering. In statistical musicology the main problem is often to find an adequate transformation of the input time series as an adequate basis for distance definition. Also, local modelling is proposed in order to account for different subpopulations, e.g. instruments. In production economics often many quality criteria have to be taken into account with very different scaling. In order to find a compromise optimum classification, this leads to a pre-transformation onto the same scale, called desirability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderberg, M.R.: Cluster Analysis for Applications. Acadamic Press, New York (1973)
MATH Google Scholar
Gnanadesikan, R.: Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York (1977)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference and Prediction. Springer, New York (2001)
MATH Google Scholar
Harrington, J.: The desirability function. Industrial Quality Control 21(10), 494–498 (1965)
Google Scholar
Neumann, C.: Einsatz von Clusterverfahren zur Produktfamilienbildung. Diploma Thesis, Department of Statistics, TU Dortmund (2007)
Google Scholar
Perner, P.: Case-based reasoning and the statistical challenges. Journal Quality and Reliability Engineering International 24(6), 705–720 (2008)
Article Google Scholar
Perner, P. (ed.): Data Mining on Multimedia Data, vol. 2558. Springer, Heidelberg (2002)
MATH Google Scholar
Roever, C., Szepannek, G.: Application of a Genetic Algorithm to Variable Selection in Fuzzy Clustering. In: Weihs, C., Gaul, W. (eds.) Classification - the Ubiquitous Challenge, pp. 674–681. Springer, Heidelberg (2005)
Chapter Google Scholar
Sturtz, S.: Comparing models for variables given on disparate spatial scales: An epidemiological example. PhD Thesis, Department of Statistics, TU Dortmund, p. 38 (2007)
Google Scholar
Szepannek, G., Schiffner, J., Wilson, J., Weihs, C.: Local Modelling in Classification. In: Perner, P. (ed.) ICDM 2008. LNCS, vol. 5077, pp. 153–164. Springer, Heidelberg (2008)
Chapter Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2005)
Google Scholar
Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Article MathSciNet Google Scholar
Weihs, C., Ligges, U., Mörchen, F., Müllensiefen, D.: Classification in Music Research. Advances in Data Analysis and Classification (ADAC) 1(3), 255–291 (2007)
Article MathSciNet MATH Google Scholar
Weihs, C., Szepannek, G., Ligges, U., Lübke, K., Raabe, N.: Local models in register classification by timbre. In: Batagelj, V., Bock, H.-H., Ferligoj, A., Ziberna, A. (eds.) Data Science and Classification, pp. 315–332. Springer, Heidelberg (2006)
Chapter Google Scholar
Weihs, C., Reuter, C., Ligges, U.: Register Classification by Timbre. In: Weihs, C., Gaul, W. (eds.) Classification - The Ubiquitous Challenge, pp. 624–631. Springer, Berlin (2005)
Chapter Google Scholar
Weihs, C., Ligges, U.: Voice Prints as a Tool for Automatic Classification of Vocal Performance. In: Kopiez, R., Lehmann, A.C., Wolther, I., Wolf, C. (eds.) Proceedings of the 5th Triennial ESCOM Conference, Hanover University of Music and Drama, Germany, September 8-13, pp. 332–335 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Dortmund, 44227, Dortmund, Germany
Claus Weihs & Gero Szepannek

Authors

Claus Weihs
View author publications
You can also search for this author in PubMed Google Scholar
Gero Szepannek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weihs, C., Szepannek, G. (2009). Distances in Classification. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2009. Lecture Notes in Computer Science(), vol 5633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03067-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-03067-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03066-6
Online ISBN: 978-3-642-03067-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics