Advertisement

Infinite Lattice Learner: an ensemble for incremental learning

  • Justin Lovinger
  • Iren ValovaEmail author
Methodologies and Application
  • 18 Downloads

Abstract

The state of the art in supervised learning has developed effective models for learning, generalizing, recognizing faces and images, time series prediction, and more. However, most of these powerful models cannot effectively learn incrementally. Infinite Lattice Learner (ILL) is an ensemble model that extends state-of-the-art machine learning methods into incremental learning models. With ILL, even batch models can learn incrementally with exceptional data retention ability. Instead of continually revisiting past instances to retain learned information, ILL allows existing methods to converge on new information without overriding previous knowledge. With ILL, models can efficiently chase a drifting function without continually revisiting a changing dataset. Models wrapped in ILL can operate in continuous real-time environments where millions of unique samples are seen every day. Big datasets too large to fit in memory, or even a single machine, can be learned in portions. ILL utilizes an infinite Cartesian grid of points with an underlying model tiled upon it. Efficient algorithms for discovering nearby points and lazy evaluation make this seemingly impossible task possible. Extensive empirical evaluation reveals impressive retention ability for all ILL models. ILL similarly proves its generalization ability on a variety of datasets from classification and regression to image recognition.

Keywords

Supervised learning Incremental learning Ensemble learning Neural networks 

Notes

Compliance with ethical standards

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185MathSciNetGoogle Scholar
  2. Balsubramani A, Dasgupta S, Freund Y (2013) The fast convergence of incremental PCA. In: Advances in neural information processing systems, pp 3174–3182Google Scholar
  3. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828Google Scholar
  4. Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 245–250Google Scholar
  5. Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC DocumentGoogle Scholar
  6. Cederborg T, Li M, Baranes A, Oudeyer P-Y (2010) Incremental local online gaussian mixture regression for imitation learning of multiple tasks. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on. IEEE, pp 267–274Google Scholar
  7. Cohen M B, Elder S, Musco C, Musco C, Persu M (2015) Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing. ACM, pp 163–172Google Scholar
  8. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  9. Cyc (2008) Graphic showing the maximum separating hyperplane and the margin. https://commons.wikimedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png. Accessed Mar 2019
  10. Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):1Google Scholar
  11. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188Google Scholar
  12. Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576
  13. Gepperth A, Hammer B (2016) Incremental learning algorithms and applications. In: European symposium on artificial neural networks (ESANN)Google Scholar
  14. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: International conference on artificial intelligence and statistics, pp 315–323Google Scholar
  15. Hagan MT, Demuth HB, Beale MH et al (1996) Neural network design. PWS Pub, BostonGoogle Scholar
  16. Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. CoRR arXiv:1507.06527
  17. Hazan E et al (2016) Introduction to online convex optimization. Found Trends® Optim 2(3–4):157–325Google Scholar
  18. Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, 1995, vol 1. IEEE, pp 278–282Google Scholar
  19. Huang G-B, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67Google Scholar
  20. Huang G-B, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892Google Scholar
  21. Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529Google Scholar
  22. Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554Google Scholar
  23. Jiang F, Sui Y, Cao C (2013) An incremental decision tree algorithm based on rough sets and its application in intrusion detection. Artif Intell Rev 4:1–14Google Scholar
  24. Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26(189–206):1MathSciNetzbMATHGoogle Scholar
  25. Kalteh AM, Hjorth P, Berndtsson R (2008) Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ Model Softw 23(7):835–845Google Scholar
  26. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69MathSciNetzbMATHGoogle Scholar
  27. Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480Google Scholar
  28. Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8Google Scholar
  29. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed Mar 2019
  30. Lovinger J, Valova I (2016) Neural field: supervised apportioned incremental learning (SAIL). In: 2016 international joint conference on neural networks (IJCNN), pp 2500–2506Google Scholar
  31. Lowe D, Broomhead D (1988) Multivariable functional interpolation and adaptive networks. Complex syst 2:321–355MathSciNetzbMATHGoogle Scholar
  32. Ma K, Ben-Arie J (2014) Compound exemplar based object detection by incremental random forest. In: Pattern recognition (ICPR), 2014 22nd international conference on. IEEE, pp 2407–2412Google Scholar
  33. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol 30, p 1Google Scholar
  34. Milborrow S (2011) Titanic decision tree. https://commons.wikimedia.org/wiki/File:CART_tree_titanic_survivors.png. Accessed Mar 2019
  35. Montgomery DC, Peck EA, Vining GG (2015) Introduction to linear regression analysis. Wiley, New YorkzbMATHGoogle Scholar
  36. Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240Google Scholar
  37. Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, vol 4. Irwin, ChicagoGoogle Scholar
  38. Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn Google Scholar
  39. Pace RK. California housing. http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html. Accessed Dec 2019
  40. Pace RK, Barry R (1997) Sparse spatial autoregressions. Stat Probab Lett 33(3):291–297zbMATHGoogle Scholar
  41. Pang S, Ozawa S, Kasabov N (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 35(5):905–914Google Scholar
  42. Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572zbMATHGoogle Scholar
  43. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  44. Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508Google Scholar
  45. Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365Google Scholar
  46. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  47. Ramos F, Ott L (2016) Hilbert maps: scalable continuous occupancy mapping with stochastic gradient descent. Int J Robot Res 35(14):1717–1730Google Scholar
  48. Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113(13):130503Google Scholar
  49. Riedmiller M (2005) Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L (eds) Machine learning: ECML 2005. ECML 2005. Lecture notes in computer science, vol 3720. Springer, Berlin, Heidelberg Google Scholar
  50. Ristin M, Guillaumin M, Gall J, Van Gool L (2016) Incremental learning of random forests for large-scale image classification. IEEE Trans Pattern Anal Mach Intell 38(3):490–503Google Scholar
  51. Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104Google Scholar
  52. Rosasco L, Villa S (2015) Learning with incremental iterative regularization. In: Advances in neural information processing systems, pp 1630–1638Google Scholar
  53. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747
  54. Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) The cart decision tree for mining data streams. Inf Sci 266:1–15zbMATHGoogle Scholar
  55. Schmid H (2013) Probabilistic part-of-speech tagging using decision trees. In: Paper presented at the meeting of the proceedings of the international conference on new methods in language processing, Manchester, UK, 1994Google Scholar
  56. Seber GA, Lee AJ (2012) Linear regression analysis, vol 936. Wiley, New YorkzbMATHGoogle Scholar
  57. Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun Rev 5(1):3–55MathSciNetGoogle Scholar
  58. Shi X, Yang Y, Guo Z, Lai Z (2014) Face recognition by sparse discriminant analysis via joint l 2, 1-norm minimization. Pattern Recognit 47(7):2447–2453Google Scholar
  59. Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Automatic face and gesture recognition, 2002. Proceedings. Fifth IEEE international conference on. IEEE, pp 53–58Google Scholar
  60. Song W, Zhu J, Li Y, Chen C (2016) Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans Circuits Syst Video Technol 26(7):1241–1250Google Scholar
  61. Soudry D, Di Castro D, Gal A, Kolodny A, Kvatinsky S (2015) Memristor-based multilayer neural networks with online gradient descent training. IEEE Trans Neural Netw Learn Syst 26(10):2408–2421MathSciNetGoogle Scholar
  62. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, CambridgezbMATHGoogle Scholar
  63. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300Google Scholar
  64. Tagliaferri R, Longo G, Milano L, Acernese F, Barone F, Ciaramella A, De Rosa R, Donalek C, Eleuteri A, Raiconi G et al (2003) Neural networks in astronomy. Neural Netw 16(3):297–319Google Scholar
  65. Tan Y, Wang J, Zurada JM (2001) Nonlinear blind source separation using a radial basis function network. IEEE Trans Neural Netw 12(1):124–134Google Scholar
  66. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288MathSciNetzbMATHGoogle Scholar
  67. Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282MathSciNetzbMATHGoogle Scholar
  68. Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6985–6989Google Scholar
  69. Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186Google Scholar
  70. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. AAAI 2094-2100Google Scholar
  71. Variddhisaï T, Mandic D (2017) Online multilinear dictionary learning for sequential compressive sensing. arXiv preprint arXiv:1703.02492
  72. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292zbMATHGoogle Scholar
  73. Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge EnglandGoogle Scholar
  74. Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp 1473–1480Google Scholar
  75. Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040Google Scholar
  76. Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and IEEE 754 compliance for NVIDIA GPUS. rn (A+ B) 21(1):18749–19424Google Scholar
  77. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los AltosGoogle Scholar
  78. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227Google Scholar
  79. Yoshida Y, Karakida R, Okada M, Amari S-I (2017) Statistical mechanical analysis of online learning with weight normalization in single layer perceptron. J Phys Soc Jpn 86(4):044002Google Scholar
  80. Zhao H, Yuen PC (2008) Incremental linear discriminant analysis for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 38(1):210–221Google Scholar
  81. Zhao H, Yuen PC, Kwok JT (2006) A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 36(4):873–886Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer and Information Science DepartmentUniversity of Massachusetts DartmouthDartmouthUSA

Personalised recommendations