Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams

Lichtenwalter, Ryan N.; Chawla, Nitesh V.

doi:10.1007/978-3-642-14640-4_5

Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams

Ryan N. Lichtenwalter²⁷ &
Nitesh V. Chawla²⁷

Conference paper

751 Accesses
24 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5669))

Abstract

Streaming data is pervasive in a multitude of data mining applications. One fundamental problem in the task of mining streaming data is distributional drift over time. Streams may also exhibit high and varying degrees of class imbalance, which can further complicate the task. In scenarios like these, class imbalance is particularly difficult to overcome and has not been as thoroughly studied. In this paper, we comprehensively consider the issues of changing distributions in conjunction with high degrees of class imbalance in streaming data. We propose new approaches based on distributional divergence and meta-classification that improve several performance metrics often applied in the study of imbalanced classification. We also propose a new distance measure for detecting distributional drift and examine its utility in weighting ensemble base classifiers. We employ a sequential validation framework, which we believe is the most meaningful option in the context of streaming imbalanced data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Becker, H., Arias, M.: Real-time ranking with concept drift using expert advice. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 86–94. ACM Press, New York (2007)
Chapter Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 97–106. ACM, New York (2001)
Chapter Google Scholar
Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755–2790 (2007)
Google Scholar
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)
Google Scholar
Gao, J., Fan, W., Han, J., Yu, P.S.: A general framework for mining concept-drifting data streams with skewed distributions. In: SDM 2007: Proceedings of the SIAM International Conference on Data Mining (2007)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 226–235. ACM, New York (2003)
Chapter Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: A new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, 878–887 (2005)
Google Scholar
Cieslak, D.A., Chawla, N.V.: Detecting fractures in classifier performance. In: ICDM 2007: Seventh IEEE International Conference on Data Mining, pp. 123–132 (2007)
Google Scholar
Cieslak, D.A., Chawla, N.V.: Learning decision trees for unbalanced data. In: European Conference on Machine Learning. Springer, Heidelberg (2008)
Google Scholar
Asuncion, A., Newman, D.: Uci machine learning repository (2007)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: ICML 2006: Proceedings of the 23rd international conference on Machine learning, pp. 233–240. ACM, New York (2006)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)
Google Scholar
Street, N.W., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: KDD 2001: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 377–382. ACM, New York (2001)
Chapter Google Scholar
Haghighi, P.D., Gaber, M.M., Krishnaswamy, S., Zaslavsky, A., Seng, L.: An architecture for context-aware adaptive data stream mining. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701. Springer, Heidelberg (2007)
Google Scholar
Blum, A.: Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning 26, 5–23 (1997)
Article Google Scholar
Forman, G.: Tackling concept drift by temporal inductive transfer. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 252–259. ACM, New York (2006)
Chapter Google Scholar
Harries, M., Horn, K.: Detecting concept drift in financial time series prediction using symbolic machine learning. In: Eighth Australian Joint Conference on Artificial Intelligence, pp. 91–98. World Scientific Publishing, Singapore (1995)
Google Scholar
Widmer, G.: Tracking context changes through meta-learning. Machine Learning 27, 259–286 (1997)
Article Google Scholar
Fan, W., Huang, Y.a., Wang, H., Yu, P.S.: Active mining of data streams. In: Proceedings of the Fourth SIAM International Conference on Data Mining, Society for Industrial Mathematics, pp. 457–461 (2004)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, P.W.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)
Google Scholar
Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: Active learning in imbalanced data classification. In: CIKM 2007: Proceedings of the sixteenth ACM Conference on information and knowledge management, pp. 127–136. ACM, New York (2007)
Chapter Google Scholar
Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In: KDD 1999: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 367–371. ACM, New York (1999)
Chapter Google Scholar
Kuncheva, L.I.: Classifier ensembles for detecting concept change in streaming data: Overview and perspectives. In: Proceedings of the 2nd Workshop SUEMA 2008 (ECAI 2008), pp. 5–10 (2008)
Google Scholar
Schlimmer, J.C., Granger, R.H.: Incremental learning from noisy data. Machine Learning 1, 317–354 (1986)
Google Scholar
Bifet, A., Gavaldá, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining, SDM 2007 (2006)
Google Scholar
Klinkenberg, R.: Using labeled and unlabeled data to learn drifting concepts. In: Workshop notes of the IJCAI 2001 Workshop on Learning from Temporal and Spatial Data, pp. 16–24 (2001)
Google Scholar
Phua, C., Miles, K.S., Lee, V., Gayler, R.: Adaptive spike detection for resilient data stream mining. In: Proceedings of the sixth Australasian conference on Data mining and analytics (AusDM 2007), pp. 181–188. Australian Computer Society, Inc., Darlinghurst (2007)
Google Scholar
Markou, M., Singh, S.: Novelty detection: A review - part 1: Statistical approaches. Signal Processing 83, 2481–2497 (2003)
Article MATH Google Scholar
Korn, F., Muthukrishnan, S., Wu, Y.: Modeling skew in data streams. In: SIGMOD 2006: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 181–192. ACM, New York (2006)
Chapter Google Scholar
Nishida, K., Yamauchi, K., Omori, T.: Ace: Adaptive classifiers-ensemble system for concept-drifting environments. Multiple Classifier Systems, 176–185 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Notre Dame, Notre Dame, IN, 46556, USA
Ryan N. Lichtenwalter & Nitesh V. Chawla

Authors

Ryan N. Lichtenwalter
View author publications
You can also search for this author in PubMed Google Scholar
Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Thammasat University, Sirindhorn International Institute of Technology,, 131 Moo 5 Tiwanont Road, Bangkadi, 12000, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Department of Architecture for Intelligence, The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka,Ibaraki, 567-0047, Osaka, Japan
Cholwich Nattee
Center for Informatics, Federal University of Pernambuco, Brazil
Paulo J. L. Adeodato
Computer Science and Engineering Department, University of Notre Dame, 353 Fitzpatrick Hall, 46556, Notre Dame, IN, USA
Nitesh Chawla
Department of Computer Science, The Australian National University, Australia
Peter Christen
TELECOM Bretagne, Lab-STICC, Institut TELECOM, Brest, France
Philippe Lenca
School of Information Technologies, University of Sydney, P.O. Box, Australia
Josiah Poon
Australian Taxation Office, Australia
Graham Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lichtenwalter, R.N., Chawla, N.V. (2010). Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams. In: Theeramunkong, T., et al. New Frontiers in Applied Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14640-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-14640-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14639-8
Online ISBN: 978-3-642-14640-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics