Data Mining pp 285-344

Data Classification

Chapter

Abstract

The classification problem is closely related to the clustering problem discussed in Chaps. 6 and 7. While the clustering problem is that of determining similar groups of data points, the classification problem is that of learning the structure of a data set of examples, already partitioned into groups, that are referred to as categories or classes. The learning of these categories is typically achieved with a model. This model is used to estimate the group identifiers (or class labels) of one or more previously unseen data examples with unknown labels. Therefore, one of the inputs to the classification problem is an example data set that has already been partitioned into different classes. This is referred to as the training data, and the group identifiers of these classes are referred to as class labels. In most cases, the class labels have a clear semantic interpretation in the context of a specific application, such as a group of customers interested in a specific product, or a group of data objects with a desired property of interest. The model learned is referred to as the training model. The previously unseen data points that need to be classified are collectively referred to as the test data set. The algorithm that creates the training model for prediction is also sometimes referred to as the learner.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations