Abstract
Training of compound ensemble classifier systems might be computationally complex and hence time consuming task. Not only elementary classifiers are to be trained, but also model of the ensemble has to be updated. Therefore, an efficiency of the training shall be considered as a compound quality which consists of not only a classification accuracy but also a running time. This gains a special importance while dealing with data streams where data arrive at high pace and the system update shall be done promptly. In this paper we present an application of Simulated Annealing based algorithm for training of data stream processing ensemble. The evaluation of our method is performed in series of experiments which show that our ensemble perform very effectively in term of accuracy and processing time.
1 Introduction
Ensemble classifier systems gained highly appreciated positions among wide range of machine learning methods [2]. This is due to their ability of elevating classification accuracy by fusion of a knowledge collected in a diversified pool of elementary predictors [1]. On the other hand, a training procedure of such a complex models is usually more time consuming. This is due to the fact, that an ensemble training affects not only one elementary classifier but set of them and, in many cases, fusion model has to be updated too. This aspects gains special importance when the ensemble is used for processing data streams especially such which features concept drift, i.e. changes in data characteristics [11]. Appearance of the each subsequent concept (also named context) must be followed by prompt classifier update. Any delay might lead to deterioration of classification accuracy or even the system obsolescence.
There are many methods of following the drift in data stream. Some algorithms use drift detectors which triggers update procedure when essential changes can be observed in incoming data [6]. Alternatively the classifier might me updated continuously. This approach is more commonly exploited in the ensemble systems [8]. In this case, classifier committee consists of predictors trained in subsequent moments of time, usually on sequentially received data chunks. Therefore, each predictor in the committee represents knowledge most valid for the moment when it was trained. There are many ways how the committee can be updated.
Most popular approaches relay on assumption that committee size is fixed and usually consists of several members only [10]. The recently created classifier replaces the “oldest” one in the committee or such that shows the worst classification accuracy measured on recent data chunk. Both methods are relatively simple and intuitive, nonetheless they do not guarantee forming the best ensemble. What more, fixed and small committee size means that some knowledge about data stream concept is lost when respective classifiers are removed from the committee. Nevertheless, in most cases this algorithm works very well as it allow to maintain ensemble validity [12]. But, when the concept appears in the stream periodicity (this phenomenon is called recurring context [3]) then permanent forgetting might be costly, because recurrence of the concept requires creating new classifier from scratches.
Therefore in our previous work [7] we proposed Evolutionary Adapted Ensemble (EAE) algorithm which maintained small voting committee and large pool of all created classifiers. A new ensemble members was developed when the concept drift had been detected. Removed from the committee classifiers was not deleted but was put into the pool for further use. Ensemble training was based on selecting committee member from the pool. This procedure allowed to recall all past concepts.
EAE training was implemented as an optimisation procedure utilising genetic algorithms (GA) [5]. There are two main features which makes GA highly effective. First is their stochastic nature which allows to browse solution space avoiding falling into local minimums. The second is using population of individuals which represent possible solutions of the problem, what allow to maintain high diversity of population. Nonetheless, the last feature makes GA computationally complex. Therefore we searched for alternative which application would be less time consuming.
In this paper we propose a new of Simulated Annealing based Ensemble SAE which uses simulated annealing (SA) in its training procedure. SAE is designed to follow concept drift and preserve classifier in a pool as it was modelled in EAE. But, two important modification of EAE were implemented in SAE in order to optimize its training time. Firstly, now the size of classifier pool is also fixed, although its size is essentially larger then the size of voting committee. Because the pool is expected to preserve all concepts which can appear again in the future, the pool size shall be set large enough. Setting the size of the pool is an arbitrary decision of the system user and shall be based on they knowledge of the problem for which system is designed. Secondly, simulated annealing (SA) algorithm is used for training [9]. SA is an optimisation algorithm which search for solution in stochastic meaner, but does not processed any population what was a main reason of its selection. The goal of research presented in this work is to evaluate classification effectiveness of EAE (SA).
The rest of the paper is organized as follow. Model of ensemble classification algorithms is presented in Sect. 2. Next, some details of SA based training procedure is provided in Sect. 3. Evaluation of the EAE is presented in Sect. 4. The last section concludes the results.
2 SAE - Ensemble Classifier Model
We start with introducing some basic terms and symbols.
Let us assume that we deal with classification problem, i.e. assigning given object being analysed to one of possible classes. Set of classes \(\mathcal {M}\) are predefined and has fixed size M. Decision is made by classification algorithm \(\varPsi \) based on observation of selected attributes x.
SAE is an ensemble system \(\varPsi ^{SAE}\) which collect elementary classifiers \(\varPsi _k\) preserved in a pool \(\varPi \).
Not all the classifiers from \(\varPi ^{SAE}\) contribute in decision making. Let \(\varXi ^{SAE}\) state for set of indexes of classifiers taken from the pool which join the committee.
A weighed fusion of committee members’ discriminating functions is implemented in SAE for decision making. Weights of classifier voices are gathered in a set of weights.
Finally, the formula of ensemble decision is defined in (5)
where \(d_{c_e}(x,j)\) states for discriminating function which represents support given by a e-th member of the committee classifier for class j. Note, that e-th member of the committee is a classifier with index \(c_e\), i.e. \(\varPsi _{c_e}\) .
Training process aims at reducing misclassification rate and is performed based on learning set \(\mathcal {LS}\), i.e. set of pairs of features x and respective class labels j.
The objective function, which estimate misclassification probability on LS is defined as follow
where L is the zero-one loss function. There are two elements which are affected by training process:
-
1.
\(\varXi ^{SAE}\) - indexes of classifiers which form the committee, and
-
2.
\(\mathcal {W}\) - values of classifiers’ weights.
3 SAE - Training Algorithm
Simulated annealing is an optimisation algorithms which is inspired by an respective physical process. Algorithm has a stochastic nature, i.e. it search for the optimal solution in an iterative and somehow random meaner. The solution is encoded in an form of a point \(\mathcal {V}^{SAE}\) consisting set of all variable being updated during optimisation. In a case of SAE \(V^{SAE}\) consists of two elements described in the last paragraph of Sect. 2.
The structure of the point is of a high importance as its both constituents have a different meaner and type. While the \([\varXi ^{SAE}\) is the vector of integers, the \(\mathcal {W}^{SAE}\) is the vector of real numbers. That fact affects the simulated annealing procedures. Pseudo-code of the algorithm is presented below (Algorithm 1).
3.1 Sequential Stream Processing
SAE processes data stream, therefore data samples are extracted from the stream sequentially and are collected in chunks. The size of the chunk are arbitrarily chosen parameter. SAE training is lunched each time when the chunk is filled with requested number of samples. There are three SAE input parameters:
-
1.
\(\mathcal {LS}\) - a current data chunk extracted from the stream
-
2.
\(\varPi ^{SAE}\) - a classifier pool. At the very beginning of the stream processing the pool is empty. Its constituents are added in a course of stream processing.
-
3.
\(\mathcal {V}^{SAE}_{t-1}\) - the solution vector obtained at the last run of the algorithm.
-
4.
\(Q_{t-1}\) - the most recent error rate.
-
5.
SAParams - set of parameters which control SA algorithm such as number of iterations.
3.2 Initialisation
If the pool \(\varPi ^{SAE}\) is not empty, it means that SAE has been run at least one time processing previous data chunk or chunks. That allow to check whether the drift appears in the stream (line 1-3). In SAE error base detector is used. It evaluates error \(Q_{t}\) of the last committee \(\mathcal {V}^{SAE}_{t-1}\) (i.e. such that was created in the last run) over the most recent data chunk \(\mathcal {LS}\). Next, \(Q_{t}\) is compared with \(Q_{t-1}\). Significant increase in the error indicates drift. This is the first reason for creating a new classifier, the second one is running the SAE for the very first data chunk, i.e. when the pool is empty.
3.3 Annealing
Annealing is a process of generating some trial point \(\mathcal {V}_{temp}\) at the distance from current point \(\mathcal {V}\). The distance is straight proportional to a temperature (line 9). It has to be mentioned here, that this procedure must performed separately for two parts of the point. \([\varXi ^{SAE}\) has to consists of integers in a range between 1 and K, i.e. number of classifier in the pool. \(\mathcal {W}^{SAE}\) is a real number vector, therefore, it can be manipulated using standard random number generator.
The temperature controls the annealing and is decreased gradually in each iteration. Next, the trial is evaluated with a target function (7) and compared with previous one (lines 10-11). If the trial point is better than the old one, it becomes a new \(\mathcal {V}_t\). Otherwise the trail point can still become a new \(\mathcal {V}_t\) with some small probability which is proportional to the current temperature. The annealing process is repeated until the maximum number of iteration is reached. As the result SAE returns three objects (line 14): (1.) the \(\mathcal {V}_t\) - the most current solution, (2.) classifier pool, and (3.) the current error rate.
4 SAE - Evaluation
A performance of SAE was checked in three series of tests which aimed at evaluation how selected parameters of the classification system and drift characteristics affected the performance. We examined the following parameters:
-
1.
size of data chunks,
-
2.
size of committee pool,
-
3.
strength of drift.
4.1 Experimental Set up
All the experiments were carried on in MATLAB 2014 framework using OPTIMTOOL toolbox, and PRTools toolbox [4]. Three benchmark datasets downloaded from UCI repository were used in tests (Table. 1).
Data streams were created using random generator with distributions estimated on datasets. A concept drift was also injected into the streams. Subsequent contexts were simulated by rotation of the feature space with chosen angle, which was a parameter of the drift generator and was used to control the strength of the drift, i.e. the higher the angle, the stronger the drift. For comparative analysis, four classifiers were implemented and tested, namely:
-
1.
SAE - presented in this paper algorithm;
-
2.
EN - Rep.Old - Ensemble updated by replacing the oldest classifier in the majority voting committee.
-
3.
EN - Rep.Worst - Ensemble updated by replacing the classifier with the highest individual misclassification rate. Majority voting strategy was used for decision making.
-
4.
Last - The elementary classifier which was most recently added to the pool.
Evaluation of the performance were performed using Test-Then-Train procedure. Samples from the stream were added to data chunks. Then the chunk was used for testing the classifier and the misclassification rate was saved. Next, SAE training procedure was launched with the chunk.
All tests were repeated 10 times, therefore presented results shows average misclassification rates.
4.2 Experiment 1. Misclassification Rate vs Size of Data Chunk
In the first experiment we checked how the size of data chunk affects the misclassification rate. The stream used in the test was generated base on Auto MPG dataset using the following parameters: number of concepts - 4, number of samples in each concept - 800, drift strength - 45 degree, the committee size - 5, the pool size - 10.
Results for four chunk sizes (i.e. 100, 200, 300, and 400) are presented in Table 2 and in Fig. 1.
-
1.
Regardless the size of data chunk SAE algorithm always outperformed both competing ensemble classifiers.
-
2.
In two cases SAE got the same result that Last classifier (for 200 and 300 samples in the chunk). In other two cases SAE was slightly better.
-
3.
Increasing number of samples does not affect SAE quality. Other ensemble classifiers improved their quality.
The first observation shows that strategies of ensemble updating by replacing the oldest or the worst classifier does not produce the best committee. It does not take into consideration the knowledge of the entire ensemble. SAE selects the committee based on evaluation of the ensemble results, what proved to be better strategy.
The last observation proved that SAE is much more effective in extracting knowledge from samples comparing to other ensemble classifiers. The small misclassification rate of SAE was achieved after processing 100 samples in data chunk, while the other ensembles need much more samples to reduce their error rate.
4.3 Experiment 2. Misclassification Rate vs Committee Size
In the second experiment we evaluated how the ensemble performance depends on the committee size. The stream used in the test was generated base on Pima Indians Diabetes dataset using the following parameters: number of concepts - 2, number of samples in each concept - 700, drift strength - 45 degree, size of data chunks - 100, the pool size - 7.
Results for three values of committee size (i.e. 3, 5, and 7) are presented in Table 3, and Fig. 2. The Last classifier was excluded from the test because it is simple classifier which uses no committee for decision making.
-
1.
In all cases SAE got better result that other ensemble classifiers.
-
2.
increasing committee size allowed to reduce error in En Rep.Old and En Rep.Worst ensemble but does not affected SAE performance.
The observations shows that strategy of preserving larger pool of available elementary classifier and selecting even small subset for decision making maintains higher flexibility of the ensemble. Removing classifier permanently from the committee limit chances for exploiting knowledge gathered in the past. Therefore SAE competitors require much larger committees to reduce the error. Worthy to note is a fact, that even when committee is extended to 7 (the size that is equal to the SAE pool size) competing ensembles did not get the same result as SAE.
4.4 Experiment 3. Misclassification Rate vs Strength of Drift
In the third experiment we evaluated how the strength of the drift affected error rate. The stream used in the test was generated base on Glass dataset using the following parameters: number of concepts - 2, number of samples in each concept - 300, size of data chunks - 100, the pool size - 10, the committee size - 3.
Results for four different drift strength (i.e. 1, 5, 10, 45 degree) are presented in Table 4, and Fig. 3.
-
1.
In all tests SAE got the best results among tested classifiers.
-
2.
Increasing the drift strength cased elevating of classification error of all classifies. This is visible by comparison results for strength 5 and 10, where the error changes are the most profound.
-
3.
The weakest relation between the strength and error can be observed in the Last classifier. This is due to fact, that its training means updating/replacing only one classifier not set of them, therefore such a simple system can adapted to changes much easier that the ensembles.
-
4.
SAE are affected by a drift strength more then the last classifier but in general it got better result.
Concluding. SAE is strongly affected by the strength of drift as all other classifiers. Nonetheless, the those relation is much weaker that in case in other ensembles what shall be assessed as a positive feature.
5 Conclusion
In the paper we presented a Novel Simulated Annealing Based Training Algorithm for Data Stream Processing. This is an algorithm which adapt ensemble model for changes in the stream characteristic. In the implementation simulated annealing was used for training and presented tests showed, that this optimisation methods is efficient create the classifier which can outperform some other ensemble methods. This work was supported by the Polish National Science Centre under the grant no. DEC-2013/09/B/ST6/02264.
References
Alpaydin, E.: Introduction to Machine Learning (Adaptive Computation and Machine Learning), vol. 5. The MIT Press (2004). http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/0262012111
Bishop, C.M.: Pattern Recognition and Machine Learning, Information Science and Statistics, vol. 4. Springer (2006). http://www.library.wisc.edu/selectedtocs/bg0137.pdf
Chen, S., Wang, H., Zhou, S., Yu, P.S.: Stop chasing trends: discovering high order models in evolving data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 923–932 (2008), http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4497501
Duin, R.P.W., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D.: PRTools4, A Matlab Toolbox for Pattern Recognition. Delft University of Technology (2004)
Eiben, A.E., Smith, J.: Introduction to Evolutionary Computing. Springer (2003)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295. Springer (2004)
Jackowski, K.: Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers. Pattern Analysis and Applications, February 2013. http://link.springer.com/10.1007/s10044-013-0318-x
Kuncheva, L.I.: Classifier Ensembles for Changing Environments, pp. 1–15 (2004)
van Laarhoven, P.J.M., Aarts, E.H.L.: Introduction. In: Simulated Annealing: Theory and Applications, pp. 1–6. Springer, Netherlands (1987). http://link.springer.com/10.1007/978-94-015-7744-1_1
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 377–382 (2001). http://portal.acm.org/citation.cfm?doid=502512.502568
Tsymbal, A.: The problem of concept drift: definitions and related work (2004)
Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: European Conference on Machine Learning, pp. 227–243. Springer (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Jackowski, K. (2018). A Novel Simulated Annealing Based Training Algorithm for Data Stream Processing Ensemble Classifier. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-59162-9_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59161-2
Online ISBN: 978-3-319-59162-9
eBook Packages: EngineeringEngineering (R0)