An Efficient Method of Building an Ensemble of Classifiers in Streaming Data

Ryu, Joung Woo; Kantardzic, Mehmed M.; Kim, Myung-Won; Ra Khil, A.

doi:10.1007/978-3-642-35542-4_11

Joung Woo Ryu¹⁸,
Mehmed M. Kantardzic¹⁹,
Myung-Won Kim²⁰ &
…
A. Ra Khil²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7678))

Included in the following conference series:

International Conference on Big Data Analytics

4800 Accesses
7 Citations

Abstract

To efficiently refine a classifier in streaming data such as sensor data and web log data we have to decide whether each streaming unlabeled datum is selected or not. The exiting methods refine a classifier based on a regular time interval. They refine a classifier even if the classification accuracy of the classifier is high. Also it uses a classifier even if the classification accuracy is low. In this paper, our ensemble method selects data in an online process that should be labeled. The selected data are used to build new classifiers of an ensemble. Our selection methodology uses training data that are applied to generate an ensemble of classifiers over streaming data. We compared the results of our ensemble approach and of a conventional ensemble approach where new classifiers for an ensemble are periodically generated. In experiments with ten benchmark data sets including three real streaming data sets, our ensemble approach generated 12.9% new classifiers for the chunk-based ensemble approach using partially labeled samples, and used an average of 10% labeled samples for the ten data sets. In all the experiments, our ensemble approach produced comparable classification accuracy. We showed that our approach can efficiently maintain the performance of an ensemble over streaming data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Minku, L.L., Yao, X.: DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Transactions on Knowledge and Data Engineering (99) (2011), doi:10.1109/TKDE.2011.58
Google Scholar
Ryu, J.W., Kantardzic, M., Walgampaya, C.: Ensemble Classifier Based on Misclassified Streaming Data. In: Proc. of the 10th IASTED Int. Conf. on Artificial Intelligence and Applications, Austria, pp. 347–354 (2010)
Google Scholar
Gao, J., Fan, W., Han, J.: On Appropriate Assumptions to Mine Data Streams: Analysis and Practice. In: Proc. of the 7th IEEE ICDM, USA, pp. 143–152 (2007)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: Proc. of the 9th ACM SIGKDD KDD, USA, pp. 226–235 (2003)
Google Scholar
Chu, F., Zaniolo, C.: Fast and Light Boosting for Adaptive Mining of Data Streams. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 282–292. Springer, Heidelberg (2004)
Chapter Google Scholar
Zhang, P., Zhu, X., Shi, Y.: Categorizing and Mining Concept Drifting Data Streams. In: Proc. of the 14th ACM SIGKDD, USA, pp. 812–820 (2008)
Google Scholar
Zhang, P., Zhu, X., Shi, Y., Wu, X.: An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 1021–1029. Springer, Heidelberg (2009)
Chapter Google Scholar
Wei, Q., Yang, Z., Junping, Z., Youg, W.: Mining Multi-Label Concept-Drifting Data Streams Using Ensemble Classifiers. In: Proc. of the 6th FSKD, China, pp. 275–279 (2009)
Google Scholar
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data. In: ICDM, Pisa, Italy, pp. 929–934 (2008)
Google Scholar
Woolam, C., Masud, M.M., Khan, L.: Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 552–562. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active Learning from Data Streams. In: Proceeding of the 7th IEEE International Conference on Data Mining, USA, pp. 757–762 (2007)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams using Ensemble Classifiers. In: Proc. of the 9th ACM SIGKDD, USA, pp. 226–235 (2003)
Google Scholar
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
MATH Google Scholar
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in Evaluation of Stream Learning Algorithms. In: Proceeding of the 15th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, France, pp. 329–338 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical Research Center, Safetia Inc., Seoul, 137-895, South Korea
Joung Woo Ryu
CECS Department, Speed School of Engineering, University of Louisville, KY, 40292, USA
Mehmed M. Kantardzic
Department of Computer Science, Soongsil University, Seoul, 156-743, South Korea
Myung-Won Kim & A. Ra Khil

Authors

Joung Woo Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Mehmed M. Kantardzic
View author publications
You can also search for this author in PubMed Google Scholar
Myung-Won Kim
View author publications
You can also search for this author in PubMed Google Scholar
A. Ra Khil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Institute of Information Technology Bangalore (IIIIT Bangalore), 26/C, Electronics City, Hosur Road, 560100, Bangalore, India
Srinath Srinivasa
Faculty of Mathematical Sciences, Department of Computer Science, University of Delhi, Delhi, India
Vasudha Bhatnagar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryu, J.W., Kantardzic, M.M., Kim, MW., Ra Khil, A. (2012). An Efficient Method of Building an Ensemble of Classifiers in Streaming Data. In: Srinivasa, S., Bhatnagar, V. (eds) Big Data Analytics. BDA 2012. Lecture Notes in Computer Science, vol 7678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35542-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-35542-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35541-7
Online ISBN: 978-3-642-35542-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics