Machine Learning Overview

  • Jiawei Zhang
  • Philip S. Yu


Learning denotes the process of acquiring new declarative knowledge, the organization of new knowledge into general yet effective representations, and the discovery of new facts and theories through observation and experimentation. Learning is one of the most important skills that mankind can master, which also renders us different from the other animals on this planet. To provide an example, according to our past experiences, we know the sun rises from the east and falls to the west; the moon rotates around the earth; 1 year has 365 days, which are all knowledge we derive from our past life experiences.


  1. 1.
    Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  2. 2.
    T. Bengtsson, P. Bickel, B. Li, Curse-of-dimensionality revisited: collapse of the particle filter in very large scale systems, in Probability and Statistics: Essays in Honor of David A. Freedman, vol. 2 (2008), pp. 316–334zbMATHGoogle Scholar
  3. 3.
    S. Berchtold, C. Bohm, H. Kriegel, The pyramid-technique: towards breaking the curse of dimensionality, in Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD ’02), vol. 27, pp. 142–153 (1998)CrossRefGoogle Scholar
  4. 4.
    J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Kluwer Academic Publishers, Norwell, 1981)zbMATHCrossRefGoogle Scholar
  5. 5.
    L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees (Wadsworth and Brooks, Monterey, 1984)Google Scholar
  6. 6.
    C. Brodley, P. Utgoff, Multivariate decision trees. Mach. Learn. 19(1), 45–77 (1995)zbMATHGoogle Scholar
  7. 7.
    O. Chapelle, B. Schlkopf, A. Zien, Semi-supervised Learning, 1st edn. (MIT Press, Cambridge, 2010)Google Scholar
  8. 8.
    J. Chung, Ç. Gülçehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555 (2014)Google Scholar
  9. 9.
    C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  10. 10.
    J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  11. 11.
    B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression. Ann. Stat. 32, 407–499 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (AAAI Press, Menlo Park, 1996)Google Scholar
  13. 13.
    S. Fahlman, C. Lebiere, The cascade-correlation learning architecture, in Advances in Neural Information Processing Systems 2 (Morgan-Kaufmann, Burlington, 1990)Google Scholar
  14. 14.
    I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016). zbMATHGoogle Scholar
  15. 15.
    I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–11182 (2003)zbMATHGoogle Scholar
  16. 16.
    J. Hartigan, M. Wong, A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)zbMATHCrossRefGoogle Scholar
  17. 17.
    D. Hawkins, The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)MathSciNetCrossRefGoogle Scholar
  18. 18.
    S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edn. (Prentice Hall PTR, Upper Saddle River, 1998)Google Scholar
  19. 19.
    S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  20. 20.
    A. Hoerl, R. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1), 80–86 (2000)zbMATHCrossRefGoogle Scholar
  21. 21.
    Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRefGoogle Scholar
  22. 22.
    T. Joachims, Text categorization with support vector machines: learning with many relevant features, in European Conference on Machine Learning (Springer, Berlin, 1998)Google Scholar
  23. 23.
    L. Kaufmann, P. Rousseeuw, Clustering by Means of Medoids (North Holland/Elsevier, Amsterdam, 1987)Google Scholar
  24. 24.
    Y. Kim, Convolutional neural networks for sentence classification, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Doha, 2014)Google Scholar
  25. 25.
    R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in International Joint Conference on Artificial Intelligence (IJCA) (Morgan Kaufmann Publishers Inc., San Francisco, 1995)Google Scholar
  26. 26.
    A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12) (Curran Associates Inc., Red Hook, 2012)Google Scholar
  27. 27.
    Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of the IEEE (IEEE, Piscataway, 1998)Google Scholar
  28. 28.
    J. Liu, S. Ji, J. Ye, SLEP: sparse learning with efficient projections. Technical report (2010)Google Scholar
  29. 29.
    W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)MathSciNetzbMATHCrossRefGoogle Scholar
  30. 30.
    M. Minsky, S. Papert, Perceptrons: Expanded Edition (MIT Press, Cambridge, 1988)zbMATHGoogle Scholar
  31. 31.
    S. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  32. 32.
    N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2014)Google Scholar
  33. 33.
    R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent neural networks. CoRR, abs/1312.6026 (2013)Google Scholar
  34. 34.
    R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13) (2013)Google Scholar
  35. 35.
    D. Pelleg, A. Moore, X-means: extending k-means with efficient estimation of the number of clusters, in Proceedings of the 17th International Conference on Machine Learning, Stanford (2000)Google Scholar
  36. 36.
    J. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report. Adv. Kernel Methods Support Vector Learning 208 (1998)Google Scholar
  37. 37.
    J. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  38. 38.
    J. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers Inc., San Francisco, 1993)Google Scholar
  39. 39.
    L. Raileanu, K. Stoffel. Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  40. 40.
    C. Rasmussen, The infinite Gaussian mixture model, in Advances in Neural Information Processing Systems 12 (MIT Press, Cambridge, 2000)Google Scholar
  41. 41.
    J. Rawlings, S. Pantula, D. Dickey, Applied Regression Analysis, 2nd edn. (Springer, Berlin, 1998)zbMATHCrossRefGoogle Scholar
  42. 42.
    F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386 (1958)CrossRefGoogle Scholar
  43. 43.
    D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition (MIT Press, Cambridge, 1986)Google Scholar
  44. 44.
    D. Rumelhart, G. Hinton, R. Williams, Learning representations by back-propagating errors, in Neurocomputing: Foundations of Research (MIT Press, Cambridge, 1988)Google Scholar
  45. 45.
    D. Rumelhart, R. Durbin, R. Golden, Y. Chauvin, Backpropagation: the basic theory, in Developments in Connectionist Theory. Backpropagation: Theory, Architectures, and Applications (Lawrence Erlbaum Associates, Inc., Hillsdale, 1995)Google Scholar
  46. 46.
    R. Salakhutdinov, G. Hinton, Deep Boltzmann machines, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2009)Google Scholar
  47. 47.
    C. Shannon, A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)MathSciNetCrossRefGoogle Scholar
  48. 48.
    D. Svozil, V. Kvasnicka, J. Pospichal, Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 39(1), 43–62 (1997)CrossRefGoogle Scholar
  49. 49.
    P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (First Edition) (Addison-Wesley Longman Publishing Co., Inc., Boston, 2005)Google Scholar
  50. 50.
    R. Tibshirani, The lasso method for variable selection in the cox model. Stat. Med. 16, 385–395 (1997)CrossRefGoogle Scholar
  51. 51.
    L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)Google Scholar
  52. 52.
    M. Verleysen, D. François, The curse of dimensionality in data mining and time series prediction, in Computational Intelligence and Bioinspired Systems. International Work-Conference on Artificial Neural Networks (Springer, Berlin, 2005)CrossRefGoogle Scholar
  53. 53.
    P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  54. 54.
    P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, Cambridge, 1974Google Scholar
  55. 55.
    X. Yan, X. Su, Linear Regression Analysis: Theory and Computing (World Scientific Publishing Co., Inc., River Edge, 2009)zbMATHCrossRefGoogle Scholar
  56. 56.
    J. Zhang, L. Cui, Y. Fu, F. Gouza, Fake news detection with deep diffusive network model. CoRR, abs/1805.08751 (2018)Google Scholar
  57. 57.
    X. Zhu, Semi-supervised learning literature survey. Comput. Sci. 2(3), 4 (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiawei Zhang
    • 1
  • Philip S. Yu
    • 2
  1. 1.Department of Computer ScienceFlorida State UniversityTallahasseeUSA
  2. 2.Department of Computer ScienceUniversity of IllinoisChicagoUSA

Personalised recommendations