Reinforcement Learning, Unsupervised Methods, and Concept Drift in Stream Learning

Benczúr, András A.; Kocsis, Levente; Pálovics, Róbert

doi:10.1007/978-3-319-63962-8_327-1

András A. Benczúr³,
Levente Kocsis³ &
Róbert Pálovics⁴

251 Accesses

Abstract

In this chapter, we give a brief overview of a few special topics in online machine learning, all of which are extensively covered in recent surveys. In Section “Reinforcement Learning,” we survey reinforcement learning. In Section “Unsupervised Data Mining,” we describe unsupervised data mining methods, including clustering, frequent itemset mining, dimensionality reduction, and topic modeling. In Section “Concept Drift and Adaptive Learning,” we describe the notion of the dataset drift or, in other terms, concept drift and list the most important drift adapting methods. We only discuss representative results in these areas. This chapter is an extension of the other chapters in this handbook, “Overview of Online Machine Learning in Big Data Streams,” “Online Machine Learning Algorithms Over Data Streams,” and “Recommender Systems Over Data Streams.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithmics (JEA) 17:2–4
Article MATH Google Scholar
Aggarwal CC (2013) A survey of stream clustering algorithms. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman and Hall/CRC, Boca Raton, p 231
Google Scholar
Aggarwal CC, Han J (2014) Frequent pattern mining. Springer, Cham
Google Scholar
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, vol 29. VLDB Endowment, pp 81–92
Chapter Google Scholar
Agrawal R, Imielienski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Bunemann P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD conference on management of data. ACM Press, New York, pp 207–216
Google Scholar
Alberg D, Last M, Kandel A (2012) Knowledge discovery in data streams with regression tree methods. Wiley Interdiscip Rev Data Min Knowl Disc 2(1): 69–78
Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77
Article MathSciNet MATH Google Scholar
Bach S, Maloof M (2010) A Bayesian approach to concept drift. In: Advances in neural information processing systems. Curran Associates, Inc., New York, pp 127–135
Google Scholar
Bifet A (2010) Adaptive stream mining: Pattern learning and mining from evolving data streams. In: Proceedings of the 2010 conference on adaptive stream mining: pattern learning and mining from evolving data streams. IOS Press, pp 1–212
Google Scholar
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: International symposium on intelligent data analysis. Springer, pp 249–260
Chapter Google Scholar
Bifet A, Read J, Pfahringer B, Holmes G, Žliobaitė I (2013) CD-MOA: change detection framework for massive online analysis. In: International symposium on intelligent data analysis. Springer, pp 92–103
Chapter MATH Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Google Scholar
Bradley PS, Fayyad UM, Reina C et al (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th international conference on knowledge discovery and data mining, pp 9–15
Google Scholar
Brand M (2002) Incremental singular value decomposition of uncertain data with missing values. In: Computer vision–ECCV 2002, pp 707–720
Chapter Google Scholar
Bunch JR, Nielsen CP (1978) Updating the singular value decomposition. Numer Math 31(2):111–129
Article MathSciNet MATH Google Scholar
Calders T, Dexters N, Gillis JJ, Goethals B (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255
Article Google Scholar
Canini K, Shi L, Griffiths T (2009) Online inference of topics with latent dirichlet allocation. In: Proceedings of the twelth international conference on artificial intelligence and statistics, in PMLR, Clearwater Beach, vol 5, pp 65–72
Google Scholar
Cao F, Estert M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM, pp 328–339
Chapter Google Scholar
Chang JH, Lee WS (2003) Estwin: adaptively monitoring the recent change of frequent itemsets over online data streams. In: Proceedings of the 12th international conference on information and knowledge management. ACM, pp 536–539
Google Scholar
Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 487–492
Google Scholar
Chang JH, Lee WS (2006) Finding frequent itemsets over online data streams. Inf Softw Technol 48(7): 606–618
Article Google Scholar
Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams. Theor Comput Sci 312(1):3–15
Article MathSciNet MATH Google Scholar
Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142
Google Scholar
Cheng J, Ke Y, Ng W (2008a) Maintaining frequent closed itemsets over a sliding window. J Inf Syst 31(3): 191–215
Article Google Scholar
Cheng J, Ke Y, Ng W (2008b) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16(1):1–27
Article Google Scholar
Chi Y, Wang H, Philip SY, Muntz RR (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3): 265–294
Article Google Scholar
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407. www.citeseer.nj.nec.com/deerwester90indexing.html
Article Google Scholar
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80
Google Scholar
Elwell R, Polikar R (2009) Incremental learning in nonstationary environments with controlled forgetting. In: International joint conference on neural networks, IJCNN2009. IEEE, pp 771–778
Google Scholar
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of ACM SIGKDD, pp 226–231
Google Scholar
Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. ACM SIGKDD Explorations Newsletter 2(1):51–57
Article Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295
Chapter Google Scholar
Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 523–528
Google Scholar
Gama J, Rodrigues PP (2007) Stream-based electricity load forecast. In: European conference on principles of data mining and knowledge discovery. Springer, pp 446–453
Google Scholar
Gama J, Rodrigues PP, Lopes L (2011) Clustering distributed sensor data streams using local processing and reduced communication. Intell Data Anal 15(1):3–28
Google Scholar
Gama J, Žliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44
Article MATH Google Scholar
Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. Next Gener Data Mining 212:191–212
Google Scholar
Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528
Article Google Scholar
Günter S, Schraudolph NN, Vishwanathan S (2007) Fast iterative kernel principal component analysis. J Mach Learn Res 8:1893–1918
Google Scholar
Hall P, Marshall D, Martin R (2000) Merging and splitting eigenspace models. IEEE Trans Pattern Anal Mach Intell 22(9):1042–1049
Article Google Scholar
Hartigan JA, Hartigan J (1975) Clustering algorithms, vol 209. Wiley, New York
Google Scholar
Ho Q, Cipar J, Cui H, Lee S, Kim JK, Gibbons PB, Gibson GA, Ganger G, Xing EP (2013) More effective distributed ML via a stale synchronous parallel parameter server. In: Advances in neural information processing systems. Neural Information Processing Systems Foundation, Inc., Lake Tahoe, pp 1223–1231
Google Scholar
Hoffman M, Bach FR, Blei DM (2010) Online learning for latent dirichlet allocation. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems. Curran Associates, Inc., New York, pp 856–864
Google Scholar
Honeine P (2012) Online kernel principal component analysis: a reduced-order model. IEEE Trans Pattern Anal Mach Intell 34(9):1814–1826
Article Google Scholar
Ipek E, Mutlu O, Martínez JF, Caruana R (2008) Self-optimizing memory controllers: A reinforcement learning approach. In: Proceedings of 35th international symposium on computer architecture, ISCA’08. IEEE, pp 39–50
Google Scholar
Jagerman R, Eickhoff C, de Rijke M (2017) Computing web-scale topic models using an asynchronous parameter server. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM
Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Article Google Scholar
Jolliffe IT (1986) Principal component analysis and factor analysis. In: Jolliffe IT (ed) Principal component analysis. Springer, New York, pp 115–128
Chapter Google Scholar
Kavitha V, Punithavalli M (2010) Clustering time series data stream-a literature survey. arXiv preprint arXiv:1005.4270
Google Scholar
Kim KI, Franz MO, Scholkopf B (2005) Iterative kernel principal component analysis for image modeling. IEEE Trans Pattern Anal Mach Intell 27(9):1351–1366
Google Scholar
Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300
Google Scholar
Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: ICML, pp 487–494
Google Scholar
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd IEEE international conference on data mining, ICDM2003. IEEE, pp 123–130
Google Scholar
Koychev I (2000) Gradual forgetting for adaptation to concept drift. In: Proceedings of the ECAI 2000 workshop on current issues in spatio-temporal reasoning
Google Scholar
Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2):249–272
Article Google Scholar
Kuncheva LI, Žliobaitė I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872
Google Scholar
Lee D, Lee W (2005) Finding maximal frequent itemsets over online data streams adaptively. In: Proceedings of the 5th IEEE international conference on data mining. IEEE, pp 8–pp
Google Scholar
Leite D, Costa P, Gomide F (2013) Evolving granular neural networks from fuzzy data streams. Neural Netw 38:1–16
Article MATH Google Scholar
Li HF, Ho CC, Lee SY (2009) Incremental updates of closed frequent itemsets over continuous data streams. Expert Syst Appl 36(2):2451–2458
Article Google Scholar
Li HF, Lee SY, Shan MK (2004) An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of the 1st international workshop on knowledge discovery in data streams, vol 39
Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web. ACM, pp 661–670
Google Scholar
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of 11th USENIX symposium on operating systems design and implementation (OSDI14). USENIX Association, pp 583–598
Google Scholar
Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318
Google Scholar
Mahdiraji AR (2009) Clustering data stream: a survey of algorithms. Int J Knowl Based Intell Eng Syst 13(2):39–44
Article Google Scholar
Maloof MA, Michalski RS (2004) Incremental learning with partial instance memory. Artif Intell 154(1–2): 95–126
Article MathSciNet MATH Google Scholar
Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5): 730–742
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Moreno-Torres JG, Raeder T, Alaiz-RodríGuez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recog 45(1):521–530
Article Google Scholar
O’callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of 18th international conference on data engineering. IEEE, pp 685–694
Google Scholar
Oja E (1982) Simplified neuron model as a principal component analyzer. J Math Biol 15(3):267–273
Article MathSciNet MATH Google Scholar
Oja E (1992) Principal components, minor components, and linear neural networks. Neural Netw 5(6):927–935
Article Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Pang-Ning T, Steinbach M, Kumar V et al (2006) Introduction to data mining. Pearson Addison Wesley, Boston/Toronto
Google Scholar
Quadrana M, Bifet A, Gavalda R (2015) An efficient closed frequent itemset miner for the MOA stream mining system. AI Commun 28(1):143–158
Google Scholar
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press: Cambridge
Google Scholar
Rodrigues PP, Gama J, Pedroso JP (2006) ODAC: hierarchical clustering of time series data streams. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM, pp 499–503
Chapter Google Scholar
Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473
Article Google Scholar
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Google Scholar
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho AC, Gama J (2013) Data stream clustering: A survey. ACM Comput Surv (CSUR) 46(1):13
Article MATH Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. Proc VLDB Endow 3(1–2): 703–710
Article Google Scholar
Song G, Yang D, Cui B, Zheng B, Liu Y, Xie K (2007) Claim: an efficient method for relaxed frequent closed itemsets mining over stream data. In: International conference on database systems for advanced applications. Springer, pp 664–675
Google Scholar
Song X, Lin CY, Tseng BL, Sun MT (2005) Modeling and predicting personal information dissemination behavior. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. ACM, pp 479–488
Google Scholar
Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Sugiyama C, Lawrence S (eds) Dataset shift in machine learning. MIT Press, Cambridge, pp 3–28
Google Scholar
Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 1038–1044
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 16. MIT Press, Cambridge, pp 285–286
Google Scholar
Syed NA, Liu H, Sung KK (1999) Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 317–321
Google Scholar
Teflioudi C, Gemulla R, Mykytiuk O (2015) Lemp: fast retrieval of large entries in a matrix product. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 107–122
Google Scholar
Tesauro G (1995) Td-gammon: a self-teaching backgammon program. In: Applications of neural networks. Springer, Boston, pp 267–285
Chapter Google Scholar
Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report 2, Computer Science Department, Trinity College Dublin
Google Scholar
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 226–235
Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
MATH Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
MATH Google Scholar
Xu R, Wunsch D (2008) Clustering, vol 10. Wiley, Hoboken
Book Google Scholar
Yen SJ, Wu CW, Lee YS, Tseng VS, Hsieh CH (2011) A fast algorithm for mining frequent closed itemsets over stream sliding window. In: 2011 IEEE international conference on fuzzy systems (FUZZ). IEEE, pp 996–1002
Google Scholar
Yu HF, Hsieh CJ, Yun H, Vishwanathan S, Dhillon IS (2015) A scalable asynchronous distributed algorithm for topic modeling. In: Proceedings of the 24th international conference on world wide web, pp 1340–1350. International World Wide Web Conferences Steering Committee
Google Scholar
Yu JX, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of the 13th international conference on very large data bases, vol 30. VLDB Endowment, pp 204–215
Google Scholar
Yuan J, Gao F, Ho Q, Dai W, Wei J, Zheng X, Xing EP, Liu TY, Ma WY (2015) Lightlda: big topic models on modest computer clusters. In: Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1351–1361
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114
Article Google Scholar
Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2):181–214
Article Google Scholar
Žliobaitė I (2009) Learning under concept drift: an overview. Technical report, Vilnius University
Google Scholar
Žliobaite I, Bifet A, Gaber M, Gabrys B, Gama J, Minku L, Musial K (2012) Next challenges for adaptive learning systems. ACM SIGKDD Explor Newsl 14(1): 48–55
Article Google Scholar

Download references

Acknowledgements

Support from the EU H2020 grant Streamline No 688191 and the “Big Data—Momentum” grant of the Hungarian Academy of Sciences.

Author information

Authors and Affiliations

Institute for Computer Science and Control, Hungarian Academy of Sciences (MTA SZTAKI), Budapest, Hungary
András A. Benczúr & Levente Kocsis
Department of Computer Science, Stanford University, Stanford, CA, USA
Róbert Pálovics

Authors

András A. Benczúr
View author publications
You can also search for this author in PubMed Google Scholar
Levente Kocsis
View author publications
You can also search for this author in PubMed Google Scholar
Róbert Pálovics
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to András A. Benczúr .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

Politecnico di Milano http://home.deib.polimi.it/margara/
Alessandro Margara
Database Systems and Information Management Group, Technische Universität Berlin, Einsteinufer 17, 10587, Berlin, Germany
Tilmann Rabl

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Benczúr, A.A., Kocsis, L., Pálovics, R. (2018). Reinforcement Learning, Unsupervised Methods, and Concept Drift in Stream Learning. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_327-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_327-1
Published: 01 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics