1 Introduction

As indicated in [1, 2], the assumption that a learning task has only one objective is very restrictive. Data instances in many real-world datasets may be simultaneously assigned multiple class labels related to multiple tasks, which may be strongly related, completely unrelated, or just weakly related to each other. Examples include a student’s grades in several courses, multiple diagnoses of a given patient, multiple outputs of a software system, multi-trait prediction from genomic data [3], etc. More examples of concurrent learning tasks are discussed in [4].

In [5], we have presented a unified framework for single-objective and multi-objective classification, called an extended classification task, which includes the following components:

  • R = (A 1 ,…, A n ) - a non-empty set of n candidate input features (n ≥ 1), where A i is an attribute i. The values of these attributes (features) can be used to predict the values of class dimensions (see below).

  • O = (C 1 ,…, C m ) - a non-empty set of m class dimensions (m ≥ 1). This is a set of tasks (targets) to predict. The extended classification task is to build an accurate model (or models) for predicting the values of all class dimensions, based on the corresponding dependency subset (or subsets) IR of selected input features. A special case of this task is multi-label classification, which allows multiple labels of the same dimension to be assigned to a given instance.

Section 2 of this paper outlines the methodology for inducing a multi-target model called Multi-objective Info-Fuzzy Network (M-IFN) and discusses its main characteristics. Section 3 reviews several case studies of applying M-IFN to practical problems in diverse branches of industry and science. Finally, in Sect. 4, we briefly discuss open challenges in multi-target classification.

2 Multi-objective Info-Fuzzy Networks

As indicated in [5], a multi-objective info-fuzzy network (M-IFN) is a multi-target extension of a single-objective info-fuzzy network (IFN). Similar to IFN, M-IFN has a single root node and an “oblivious read-once decision graph” structure, where all nodes of a given layer are labeled by the same feature and each feature is tested at most once along any path. It also has a target layer with a target node for each class label of every target. Every internal M-IFN node is shared among all targets, which makes it an extreme case of a Shared Binary Decision Diagram [6]. This implies that each terminal (leaf) node is connected to at least one target node associated with a value of every target.

Unlike CART [7], C4.5 [8], and EODG [9], the M-IFN construction algorithm has only the growing (top-down) phase, which iteratively chooses predictive features maximizing the decrease in the total conditional entropy of all targets. The top-down construction is pre-pruned by the Likelihood-Ratio Test. The details of M-IFN construction procedure are presented in [5].

In [10], we show the M-IFN algorithm to have the following important properties:

  • The average conditional entropy of m targets in an n-input m-dimensional M-IFN model is not greater than the average conditional entropy over m single-target models S i (i = 1,…, m) based on the same n input features. This inequality is strengthened if the multi-target M-IFN model is built upon more features than the single-target models. Consequently, we may expect that the average accuracy of a multi-target M-IFN model in predicting the values of m targets will not be lower, or even will be higher, than the average accuracy of m single-target models using the same set of predictive features.

  • If all class dimensions (targets) are either mutually independent or totally dependent on each other, the input features selected by the M-IFN algorithm will minimize the joint conditional entropy of all targets, i.e. will provide the most accurate classification model for all target classes. The case of mutual independence extends the scope of multitask (transfer) learning [1], where all “extra” tasks (targets) are assumed to be related to the main classification target.

3 Case Studies

Our first case study [11] refers to prediction of grape and wine quality in a multi-year dataset provided by Yarden - Golan Heights Winery in Israel. For each grape field in every season, the Winery keeps record of 27 quality parameters (target variables) along with 135 candidate input features. Thus predicting grape and wine quality is clearly a multi-target classification task. We have used M-IFN to identify the most significant predictive factors of grape and wine quality parameters. We have also shown that on average, single-target IFN models are significantly more accurate on this data than C4.5 decision-tree models whereas the M-IFN models are even more accurate than the single-target IFN models. This result agrees with the previously mentioned observation that the average accuracy of a single multi-target M-IFN model is not expected to be worse than the average accuracy of multiple single-target models using the same set of predictive features.

The second case study [12], partially supported by General Motors, deals with predicting the probability and the timing of vehicle failures based on an integrated database of sensor measurements and warranty claims. We have applied the IFN and M-IFN induction algorithms to a dataset of 46,418 records representing periodical battery sensory readings for 21,814 distinct vehicles of a high-end model. The prediction models have been evaluated by the area under ROC (Receiver Operating Characteristics) curves, also known as the Area under Curve, or AUC. Though the IFN and the M-IFN ROC curves for the target attribute Battery Failure are nearly identical, the multi-target approach has shown a clear advantage in terms of model comprehensibility as it reduced the total number of prediction rules by 33 %.

The third, more recent case study [13] is aimed at predicting the number and the maximum magnitude of seismic events in the next year based on the seismic events recorded in the same region during the previous years. The predictive features include six seismic indicators commonly used in earthquake prediction literature as well as 20 new features based on the moving annual averages of the number of earthquakes. We have evaluated eight classification algorithms on a catalog of 9,042 earthquake events, which took place between 01/01/1983 and 31/12/2010 in 33 seismic regions of Israel and its neighboring countries. The M-IFN algorithm has clearly shown the best result in terms of the Area under Curve (AUC) criterion, explained by its unique capability to take into account the relationship between two target variables: the total number of earthquakes and the maximum earthquake magnitude during the same year.

4 Conclusions

In this paper, we have presented the M-IFN (Multi-objective Info-Fuzzy Network) algorithm for inducing multi-target classification models. The algorithm’s effectiveness and broad applicability have been demonstrated via case studies in three diverse fields: winemaking, predictive maintenance, and seismology. The multi-target classification domain is facing a number of exciting challenges such as semi-supervised learning from a subset of targets, handling delayed target values, and adapting deep learning algorithms for the multi-target classification task.