Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Reinforcement Learning, Unsupervised Methods, and Concept Drift in Stream Learning

  • András A. Benczúr
  • Levente Kocsis
  • Róbert Pálovics
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_327-1

Abstract

In this chapter, we give a brief overview of a few special topics in online machine learning, all of which are extensively covered in recent surveys. In Section “Reinforcement Learning,” we survey reinforcement learning. In Section “Unsupervised Data Mining,” we describe unsupervised data mining methods, including clustering, frequent itemset mining, dimensionality reduction, and topic modeling. In Section “Concept Drift and Adaptive Learning,” we describe the notion of the dataset drift or, in other terms, concept drift and list the most important drift adapting methods. We only discuss representative results in these areas. This chapter is an extension of the other chapters in this handbook, “Overview of Online Machine Learning in Big Data Streams,” “Online Machine Learning Algorithms Over Data Streams,” and “Recommender Systems Over Data Streams.”

This is a preview of subscription content, log in to check access.

Notes

Acknowledgements

Support from the EU H2020 grant Streamline No 688191 and the “Big Data—Momentum” grant of the Hungarian Academy of Sciences.

References

  1. Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) Streamkm++: a clustering algorithm for data streams. J Exp Algorithmics (JEA) 17:2–4zbMATHCrossRefGoogle Scholar
  2. Aggarwal CC (2013) A survey of stream clustering algorithms. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman and Hall/CRC, Boca Raton, p 231Google Scholar
  3. Aggarwal CC, Han J (2014) Frequent pattern mining. Springer, ChamGoogle Scholar
  4. Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, vol 29. VLDB Endowment, pp 81–92CrossRefGoogle Scholar
  5. Agrawal R, Imielienski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Bunemann P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD conference on management of data. ACM Press, New York, pp 207–216Google Scholar
  6. Alberg D, Last M, Kandel A (2012) Knowledge discovery in data streams with regression tree methods. Wiley Interdiscip Rev Data Min Knowl Disc 2(1): 69–78Google Scholar
  7. Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77MathSciNetzbMATHCrossRefGoogle Scholar
  8. Bach S, Maloof M (2010) A Bayesian approach to concept drift. In: Advances in neural information processing systems. Curran Associates, Inc., New York, pp 127–135Google Scholar
  9. Bifet A (2010) Adaptive stream mining: Pattern learning and mining from evolving data streams. In: Proceedings of the 2010 conference on adaptive stream mining: pattern learning and mining from evolving data streams. IOS Press, pp 1–212Google Scholar
  10. Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: International symposium on intelligent data analysis. Springer, pp 249–260CrossRefGoogle Scholar
  11. Bifet A, Read J, Pfahringer B, Holmes G, Žliobaitė I (2013) CD-MOA: change detection framework for massive online analysis. In: International symposium on intelligent data analysis. Springer, pp 92–103zbMATHCrossRefGoogle Scholar
  12. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022Google Scholar
  13. Bradley PS, Fayyad UM, Reina C et al (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th international conference on knowledge discovery and data mining, pp 9–15Google Scholar
  14. Brand M (2002) Incremental singular value decomposition of uncertain data with missing values. In: Computer vision–ECCV 2002, pp 707–720CrossRefGoogle Scholar
  15. Bunch JR, Nielsen CP (1978) Updating the singular value decomposition. Numer Math 31(2):111–129MathSciNetzbMATHCrossRefGoogle Scholar
  16. Calders T, Dexters N, Gillis JJ, Goethals B (2014) Mining frequent itemsets in a stream. Inf Syst 39:233–255CrossRefGoogle Scholar
  17. Canini K, Shi L, Griffiths T (2009) Online inference of topics with latent dirichlet allocation. In: Proceedings of the twelth international conference on artificial intelligence and statistics, in PMLR, Clearwater Beach, vol 5, pp 65–72Google Scholar
  18. Cao F, Estert M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM, pp 328–339CrossRefGoogle Scholar
  19. Chang JH, Lee WS (2003) Estwin: adaptively monitoring the recent change of frequent itemsets over online data streams. In: Proceedings of the 12th international conference on information and knowledge management. ACM, pp 536–539Google Scholar
  20. Chang JH, Lee WS (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 487–492Google Scholar
  21. Chang JH, Lee WS (2006) Finding frequent itemsets over online data streams. Inf Softw Technol 48(7): 606–618CrossRefGoogle Scholar
  22. Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams. Theor Comput Sci 312(1):3–15MathSciNetzbMATHCrossRefGoogle Scholar
  23. Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 133–142Google Scholar
  24. Cheng J, Ke Y, Ng W (2008a) Maintaining frequent closed itemsets over a sliding window. J Inf Syst 31(3): 191–215CrossRefGoogle Scholar
  25. Cheng J, Ke Y, Ng W (2008b) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16(1):1–27CrossRefGoogle Scholar
  26. Chi Y, Wang H, Philip SY, Muntz RR (2006) Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3): 265–294CrossRefGoogle Scholar
  27. Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407. www.citeseer.nj.nec.com/deerwester90indexing.htmlCrossRefGoogle Scholar
  28. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80Google Scholar
  29. Elwell R, Polikar R (2009) Incremental learning in nonstationary environments with controlled forgetting. In: International joint conference on neural networks, IJCNN2009. IEEE, pp 771–778Google Scholar
  30. Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of ACM SIGKDD, pp 226–231Google Scholar
  31. Farnstrom F, Lewis J, Elkan C (2000) Scalability for clustering algorithms revisited. ACM SIGKDD Explorations Newsletter 2(1):51–57CrossRefGoogle Scholar
  32. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295CrossRefGoogle Scholar
  33. Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 523–528Google Scholar
  34. Gama J, Rodrigues PP (2007) Stream-based electricity load forecast. In: European conference on principles of data mining and knowledge discovery. Springer, pp 446–453Google Scholar
  35. Gama J, Rodrigues PP, Lopes L (2011) Clustering distributed sensor data streams using local processing and reduced communication. Intell Data Anal 15(1):3–28Google Scholar
  36. Gama J, Žliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44zbMATHCrossRefGoogle Scholar
  37. Giannella C, Han J, Pei J, Yan X, Yu PS (2003) Mining frequent patterns in data streams at multiple time granularities. Next Gener Data Mining 212:191–212Google Scholar
  38. Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528CrossRefGoogle Scholar
  39. Günter S, Schraudolph NN, Vishwanathan S (2007) Fast iterative kernel principal component analysis. J Mach Learn Res 8:1893–1918Google Scholar
  40. Hall P, Marshall D, Martin R (2000) Merging and splitting eigenspace models. IEEE Trans Pattern Anal Mach Intell 22(9):1042–1049CrossRefGoogle Scholar
  41. Hartigan JA, Hartigan J (1975) Clustering algorithms, vol 209. Wiley, New YorkGoogle Scholar
  42. Ho Q, Cipar J, Cui H, Lee S, Kim JK, Gibbons PB, Gibson GA, Ganger G, Xing EP (2013) More effective distributed ML via a stale synchronous parallel parameter server. In: Advances in neural information processing systems. Neural Information Processing Systems Foundation, Inc., Lake Tahoe, pp 1223–1231Google Scholar
  43. Hoffman M, Bach FR, Blei DM (2010) Online learning for latent dirichlet allocation. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems. Curran Associates, Inc., New York, pp 856–864Google Scholar
  44. Honeine P (2012) Online kernel principal component analysis: a reduced-order model. IEEE Trans Pattern Anal Mach Intell 34(9):1814–1826CrossRefGoogle Scholar
  45. Ipek E, Mutlu O, Martínez JF, Caruana R (2008) Self-optimizing memory controllers: A reinforcement learning approach. In: Proceedings of 35th international symposium on computer architecture, ISCA’08. IEEE, pp 39–50Google Scholar
  46. Jagerman R, Eickhoff C, de Rijke M (2017) Computing web-scale topic models using an asynchronous parameter server. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACMGoogle Scholar
  47. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323CrossRefGoogle Scholar
  48. Jolliffe IT (1986) Principal component analysis and factor analysis. In: Jolliffe IT (ed) Principal component analysis. Springer, New York, pp 115–128CrossRefGoogle Scholar
  49. Kavitha V, Punithavalli M (2010) Clustering time series data stream-a literature survey. arXiv preprint arXiv:1005.4270Google Scholar
  50. Kim KI, Franz MO, Scholkopf B (2005) Iterative kernel principal component analysis for image modeling. IEEE Trans Pattern Anal Mach Intell 27(9):1351–1366Google Scholar
  51. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300Google Scholar
  52. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: ICML, pp 487–494Google Scholar
  53. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd IEEE international conference on data mining, ICDM2003. IEEE, pp 123–130Google Scholar
  54. Koychev I (2000) Gradual forgetting for adaptation to concept drift. In: Proceedings of the ECAI 2000 workshop on current issues in spatio-temporal reasoningGoogle Scholar
  55. Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2):249–272CrossRefGoogle Scholar
  56. Kuncheva LI, Žliobaitė I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872Google Scholar
  57. Lee D, Lee W (2005) Finding maximal frequent itemsets over online data streams adaptively. In: Proceedings of the 5th IEEE international conference on data mining. IEEE, pp 8–ppGoogle Scholar
  58. Leite D, Costa P, Gomide F (2013) Evolving granular neural networks from fuzzy data streams. Neural Netw 38:1–16zbMATHCrossRefGoogle Scholar
  59. Li HF, Ho CC, Lee SY (2009) Incremental updates of closed frequent itemsets over continuous data streams. Expert Syst Appl 36(2):2451–2458CrossRefGoogle Scholar
  60. Li HF, Lee SY, Shan MK (2004) An efficient algorithm for mining frequent itemsets over the entire history of data streams. In: Proceedings of the 1st international workshop on knowledge discovery in data streams, vol 39Google Scholar
  61. Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on world wide web. ACM, pp 661–670Google Scholar
  62. Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: Proceedings of 11th USENIX symposium on operating systems design and implementation (OSDI14). USENIX Association, pp 583–598Google Scholar
  63. Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318Google Scholar
  64. Mahdiraji AR (2009) Clustering data stream: a survey of algorithms. Int J Knowl Based Intell Eng Syst 13(2):39–44CrossRefGoogle Scholar
  65. Maloof MA, Michalski RS (2004) Incremental learning with partial instance memory. Artif Intell 154(1–2): 95–126MathSciNetzbMATHCrossRefGoogle Scholar
  66. Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5): 730–742CrossRefGoogle Scholar
  67. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRefGoogle Scholar
  68. Moreno-Torres JG, Raeder T, Alaiz-RodríGuez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recog 45(1):521–530CrossRefGoogle Scholar
  69. O’callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of 18th international conference on data engineering. IEEE, pp 685–694Google Scholar
  70. Oja E (1982) Simplified neuron model as a principal component analyzer. J Math Biol 15(3):267–273MathSciNetzbMATHCrossRefGoogle Scholar
  71. Oja E (1992) Principal components, minor components, and linear neural networks. Neural Netw 5(6):927–935CrossRefGoogle Scholar
  72. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  73. Pang-Ning T, Steinbach M, Kumar V et al (2006) Introduction to data mining. Pearson Addison Wesley, Boston/TorontoGoogle Scholar
  74. Quadrana M, Bifet A, Gavalda R (2015) An efficient closed frequent itemset miner for the MOA stream mining system. AI Commun 28(1):143–158Google Scholar
  75. Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press: CambridgeGoogle Scholar
  76. Rodrigues PP, Gama J, Pedroso JP (2006) ODAC: hierarchical clustering of time series data streams. In: Proceedings of the 2006 SIAM international conference on data mining. SIAM, pp 499–503CrossRefGoogle Scholar
  77. Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473CrossRefGoogle Scholar
  78. Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354Google Scholar
  79. Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319CrossRefGoogle Scholar
  80. Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho AC, Gama J (2013) Data stream clustering: A survey. ACM Comput Surv (CSUR) 46(1):13zbMATHCrossRefGoogle Scholar
  81. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489CrossRefGoogle Scholar
  82. Smola A, Narayanamurthy S (2010) An architecture for parallel topic models. Proc VLDB Endow 3(1–2): 703–710CrossRefGoogle Scholar
  83. Song G, Yang D, Cui B, Zheng B, Liu Y, Xie K (2007) Claim: an efficient method for relaxed frequent closed itemsets mining over stream data. In: International conference on database systems for advanced applications. Springer, pp 664–675Google Scholar
  84. Song X, Lin CY, Tseng BL, Sun MT (2005) Modeling and predicting personal information dissemination behavior. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. ACM, pp 479–488Google Scholar
  85. Storkey A (2009) When training and test sets are different: characterizing learning transfer. In: Sugiyama C, Lawrence S (eds) Dataset shift in machine learning. MIT Press, Cambridge, pp 3–28Google Scholar
  86. Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. MIT Press, Cambridge, pp 1038–1044Google Scholar
  87. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 16. MIT Press, Cambridge, pp 285–286Google Scholar
  88. Syed NA, Liu H, Sung KK (1999) Handling concept drifts in incremental learning with support vector machines. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 317–321Google Scholar
  89. Teflioudi C, Gemulla R, Mykytiuk O (2015) Lemp: fast retrieval of large entries in a matrix product. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 107–122Google Scholar
  90. Tesauro G (1995) Td-gammon: a self-teaching backgammon program. In: Applications of neural networks. Springer, Boston, pp 267–285CrossRefGoogle Scholar
  91. Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report 2, Computer Science Department, Trinity College DublinGoogle Scholar
  92. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 226–235Google Scholar
  93. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292zbMATHGoogle Scholar
  94. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101Google Scholar
  95. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256zbMATHGoogle Scholar
  96. Xu R, Wunsch D (2008) Clustering, vol 10. Wiley, HobokenCrossRefGoogle Scholar
  97. Yen SJ, Wu CW, Lee YS, Tseng VS, Hsieh CH (2011) A fast algorithm for mining frequent closed itemsets over stream sliding window. In: 2011 IEEE international conference on fuzzy systems (FUZZ). IEEE, pp 996–1002Google Scholar
  98. Yu HF, Hsieh CJ, Yun H, Vishwanathan S, Dhillon IS (2015) A scalable asynchronous distributed algorithm for topic modeling. In: Proceedings of the 24th international conference on world wide web, pp 1340–1350. International World Wide Web Conferences Steering CommitteeGoogle Scholar
  99. Yu JX, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of the 13th international conference on very large data bases, vol 30. VLDB Endowment, pp 204–215Google Scholar
  100. Yuan J, Gao F, Ho Q, Dai W, Wei J, Zheng X, Xing EP, Liu TY, Ma WY (2015) Lightlda: big topic models on modest computer clusters. In: Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1351–1361Google Scholar
  101. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114CrossRefGoogle Scholar
  102. Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2):181–214CrossRefGoogle Scholar
  103. Žliobaitė I (2009) Learning under concept drift: an overview. Technical report, Vilnius UniversityGoogle Scholar
  104. Žliobaite I, Bifet A, Gaber M, Gabrys B, Gama J, Minku L, Musial K (2012) Next challenges for adaptive learning systems. ACM SIGKDD Explor Newsl 14(1): 48–55CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • András A. Benczúr
    • 1
  • Levente Kocsis
    • 1
  • Róbert Pálovics
    • 2
  1. 1.Institute for Computer Science and ControlHungarian Academy of Sciences (MTA SZTAKI)BudapestHungary
  2. 2.Department of Computer ScienceStanford UniversityStanfordUSA

Section editors and affiliations

  • Alessandro Margara
    • 1
  • Tilmann Rabl
    • 2
  1. 1.Politecnico di Milano
  2. 2.Database Systems and Information Management GroupTechnische Universität BerlinBerlinGermany