Abstract
In this chapter, we address the problem of analyzing a set of inputs/data without labels with the goal of finding “interesting patterns” or structures in the data. This type of problem is sometimes called a knowledge discovery problem. Compared to other machine learning problems such as supervised learning, this is a much more open problem, since in general there is no well-defined metric to use and neither there is any specific kind of patterns that we wish to look for. Within unsupervised machine learning, the most common type of problems is the clustering problem; though other problems such as novelty detection, dimensionality reduction and outlier detection are also part of this area. So here we will discuss different clustering methods, compare their advantages and disadvantages, and discuss measures for evaluating their quality. The chapter finishes with a case study using a real data set that analyzes the expenditure of different countries on education.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The intracluster distance of sample i is obtained by the distance of the sample to the nearest sample from the same class, and the nearest-cluster distance is given by the distance to the closest sample from the cluster nearest to the cluster of sample i.
- 3.
References
Press, WH; Teukolsky, SA; Vetterling, W.T.; Flannery, B.P. (2007). “Section 16.1. Gaussian Mixture Models and k-Means Clustering”. Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.
Meilǎ, M.; Shi, J. (2001); “Learning Segmentation by Random Walks”, Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.
Székely, G.J.; Rizzo, M.L. (2005). “Hierarchical clustering via Joint Between-Within Distances: Extending Ward’s Minimum Variance Method”, Journal of Classification 22, 151–183.
Acknowledgements
This chapter was co-written by Petia Radeva and Oriol Pujol.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Igual, L., Seguí, S. (2017). Unsupervised Learning. In: Introduction to Data Science. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-50017-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-50017-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50016-4
Online ISBN: 978-3-319-50017-1
eBook Packages: Computer ScienceComputer Science (R0)