Abstract
To efficiently refine a classifier in streaming data such as sensor data and web log data we have to decide whether each streaming unlabeled datum is selected or not. The exiting methods refine a classifier based on a regular time interval. They refine a classifier even if the classification accuracy of the classifier is high. Also it uses a classifier even if the classification accuracy is low. In this paper, our ensemble method selects data in an online process that should be labeled. The selected data are used to build new classifiers of an ensemble. Our selection methodology uses training data that are applied to generate an ensemble of classifiers over streaming data. We compared the results of our ensemble approach and of a conventional ensemble approach where new classifiers for an ensemble are periodically generated. In experiments with ten benchmark data sets including three real streaming data sets, our ensemble approach generated 12.9% new classifiers for the chunk-based ensemble approach using partially labeled samples, and used an average of 10% labeled samples for the ten data sets. In all the experiments, our ensemble approach produced comparable classification accuracy. We showed that our approach can efficiently maintain the performance of an ensemble over streaming data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Minku, L.L., Yao, X.: DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Transactions on Knowledge and Data Engineering (99) (2011), doi:10.1109/TKDE.2011.58
Ryu, J.W., Kantardzic, M., Walgampaya, C.: Ensemble Classifier Based on Misclassified Streaming Data. In: Proc. of the 10th IASTED Int. Conf. on Artificial Intelligence and Applications, Austria, pp. 347–354 (2010)
Gao, J., Fan, W., Han, J.: On Appropriate Assumptions to Mine Data Streams: Analysis and Practice. In: Proc. of the 7th IEEE ICDM, USA, pp. 143–152 (2007)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: Proc. of the 9th ACM SIGKDD KDD, USA, pp. 226–235 (2003)
Chu, F., Zaniolo, C.: Fast and Light Boosting for Adaptive Mining of Data Streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)
Zhang, P., Zhu, X., Shi, Y.: Categorizing and Mining Concept Drifting Data Streams. In: Proc. of the 14th ACM SIGKDD, USA, pp. 812–820 (2008)
Zhang, P., Zhu, X., Shi, Y., Wu, X.: An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 1021–1029. Springer, Heidelberg (2009)
Wei, Q., Yang, Z., Junping, Z., Youg, W.: Mining Multi-Label Concept-Drifting Data Streams Using Ensemble Classifiers. In: Proc. of the 6th FSKD, China, pp. 275–279 (2009)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data. In: ICDM, Pisa, Italy, pp. 929–934 (2008)
Woolam, C., Masud, M.M., Khan, L.: Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 552–562. Springer, Heidelberg (2009)
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active Learning from Data Streams. In: Proceeding of the 7th IEEE International Conference on Data Mining, USA, pp. 757–762 (2007)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: Proc. of the 9th ACM SIGKDD, USA, pp. 226–235 (2003)
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in Evaluation of Stream Learning Algorithms. In: Proceeding of the 15th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, France, pp. 329–338 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ryu, J.W., Kantardzic, M.M., Kim, MW., Ra Khil, A. (2012). An Efficient Method of Building an Ensemble of Classifiers in Streaming Data. In: Srinivasa, S., Bhatnagar, V. (eds) Big Data Analytics. BDA 2012. Lecture Notes in Computer Science, vol 7678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35542-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-35542-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35541-7
Online ISBN: 978-3-642-35542-4
eBook Packages: Computer ScienceComputer Science (R0)