Abstract
Detecting and adapting to concept drifts make learning data stream classifiers a difficult task. It becomes even more complex when the distribution of classes in the stream is imbalanced. Currently, proper assessment of classifiers for such data is still a challenge, as existing evaluation measures either do not take into account class imbalance or are unable to indicate class ratio changes in time. In this paper, we advocate the use of the area under the ROC curve (AUC) in imbalanced data stream settings and propose an efficient incremental algorithm that uses a sorted tree structure with a sliding window to compute AUC using constant time and memory. Additionally, we experimentally verify that this algorithm is capable of correctly evaluating classifiers on imbalanced streams and can be used as a basis for detecting changes in class definitions and imbalance ratio.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source code, test scripts, generator parameters, and links to datasets available at:http://www.cs.put.poznan.pl/dbrzezinski/software.php.
References
Krempl, G., Zliobaite, I., Brzezinski, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explor. 16(1), 1–10 (2014)
Batista, G., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
He, H., Ma, Y. (eds.): Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press, Hoboken (2013)
Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)
Hoens, T.R., Chawla, N.V.: Learning in non-stationary environments with class imbalance. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 168–176, ACM (2012)
Lichtenwalter, R.N., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) PAKDD Workshops 2009. LNCS, vol. 5669, pp. 53–75. Springer, Heidelberg (2010)
Wang, B., Pineau, J.: Online ensemble learning for imbalanced data streams. CoRR abs/1310.8004 (2013)
Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)
Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS, vol. 6332, pp. 1–15. Springer, Heidelberg (2010)
Zliobaite, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98, 455–482 (2015). doi:10.1007/s10994-014-5441-4
Wu, S., Flach, P.A., Ferri, C.: An improved model selection heuristic for AUC. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 478–489. Springer, Heidelberg (2007)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall, Boca Raton (2010)
Bouckaert, R.R.: Efficient AUC learning curve calculation. In: Sattar, A., Kang, B.-H. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 181–191. Springer, Heidelberg (2006)
Provost, F.J., Domingos, P.: Tree induction for probability-based ranking. Mach. Learn. 52(3), 199–215 (2003)
Fawcett, T.: Using rule sets to maximize ROC performance. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 131–138 (2001)
Bayer, R.: Symmetric binary b-trees: data structure and maintenance algorithms. Acta Inf. 1, 290–306 (1972)
Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B.: PAKDD data mining competition (2009)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 377–382 (2001)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 226–235 (2003)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, New York (2011)
Acknowledgments
The authors’ research was funded by the Polish National Science Center under Grant No. DEC-2013/11/B/ST6/00963.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Brzezinski, D., Stefanowski, J. (2015). Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2014. Lecture Notes in Computer Science(), vol 8983. Springer, Cham. https://doi.org/10.1007/978-3-319-17876-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-17876-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17875-2
Online ISBN: 978-3-319-17876-9
eBook Packages: Computer ScienceComputer Science (R0)