In contrast to supervised learning, unsupervised learning fits a model to observations assuming there is no dependent random variable, output, or response. That is, a set of input observations is gathered and treated as a set of random variables and analyzed as is. None of the observations is treated differently from the others. An informal way to say this is that there is no Y. For this reason, sometimes classification data that includes the Y as the class is called labeled data but clustering data is called unlabeled. Then, it’s as if the task of clustering is to surmise what variable Y should have been measured (but wasn’t). Another way to think of this is to assume that there are n independent data vectors (X 1, ...,Xp,Y) but that all the Y is are missing, and in fact someone has even hidden the definition of Y.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag New York
About this chapter
Cite this chapter
Clarke, B., Fokoué, E., Zhang, H.H. (2009). Unsupervised Learning: Clustering. In: Principles and Theory for Data Mining and Machine Learning. Springer Series in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-98135-2_8
Download citation
DOI: https://doi.org/10.1007/978-0-387-98135-2_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-98134-5
Online ISBN: 978-0-387-98135-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)