An Introduction to Neural Networks



Artificial neural networks are popular machine learning techniques that simulate the mechanism of learning in biological organisms. The human nervous system contains cells, which are referred to as neurons. The neurons are connected to one another with the use of axons and dendrites, and the connecting regions between axons and dendrites are referred to as synapses. These connections are illustrated in Figure 1.1(a). The strengths of synaptic connections often change in response to external stimuli. This change is how learning takes place in living organisms.


  1. [2]
    C. Aggarwal. Data classification: Algorithms and applications, CRC Press, 2014.Google Scholar
  2. [3]
    C. Aggarwal. Data mining: The textbook. Springer, 2015.Google Scholar
  3. [6]
    C. Aggarwal. Machine learning for text. Springer, 2018.Google Scholar
  4. [27]
    Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), pp. 1–127, 2009.CrossRefGoogle Scholar
  5. [28]
    Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI, 35(8), pp. 1798–1828, 2013.CrossRefGoogle Scholar
  6. [30]
    Y. Bengio and O. Delalleau. On the expressive power of deep architectures. Algorithmic Learning Theory, pp. 18–36, 2011.Google Scholar
  7. [35]
    J. Bergstra et al. Theano: A CPU and GPU math compiler in Python. Python in Science Conference, 2010.Google Scholar
  8. [40]
    C. M. Bishop. Pattern recognition and machine learning. Springer, 2007.Google Scholar
  9. [41]
    C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.Google Scholar
  10. [49]
    L. Breiman. Random forests. Journal Machine Learning archive, 45(1), pp. 5–32, 2001.CrossRefGoogle Scholar
  11. [54]
    A. Bryson. A gradient method for optimizing multi-stage allocation processes. Harvard University Symposium on Digital Computers and their Applications, 1961.Google Scholar
  12. [72]
    D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber. Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), pp. 3207–3220, 2010.CrossRefGoogle Scholar
  13. [84]
    T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp. 326–334, 1965.Google Scholar
  14. [89]
    N. de Freitas. Machine Learning, University of Oxford (Course Video), 2013.
  15. [90]
    N. de Freitas. Deep Learning, University of Oxford (Course Video), 2015.
  16. [93]
    O. Delalleau and Y. Bengio. Shallow vs. deep sum-product networks. NIPS Conference, pp. 666–674, 2011.Google Scholar
  17. [123]
    Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3), pp. 277–296, 1999.CrossRefGoogle Scholar
  18. [127]
    K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), pp. 193–202, 1980.CrossRefGoogle Scholar
  19. [128]
    S. Gallant. Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2), pp. 179–191, 1990.CrossRefGoogle Scholar
  20. [137]
    A. Ghodsi. STAT 946: Topics in Probability and Statistics: Deep Learning, University of Waterloo, Fall 2015.
  21. [140]
    X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.Google Scholar
  22. [147]
    I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.Google Scholar
  23. [150]
    A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013.Google Scholar
  24. [158]
    A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv:1410.5401, 2014.
  25. [161]
    K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016.
  26. [176]
    D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick. Neuroscience-inspired artificial intelligence. Neuron, 95(2), pp. 245–258, 2017.CrossRefGoogle Scholar
  27. [177]
    T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.Google Scholar
  28. [182]
    S. Haykin. Neural networks and learning machines. Pearson, 2008.Google Scholar
  29. [184]
    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.Google Scholar
  30. [198]
    G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313, (5766), pp. 504–507, 2006.Google Scholar
  31. [204]
    S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.CrossRefGoogle Scholar
  32. [207]
    J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. National Academy of Sciences of the USA, 79(8), pp. 2554–2558, 1982.MathSciNetCrossRefGoogle Scholar
  33. [208]
    K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), pp. 359–366, 1989.CrossRefGoogle Scholar
  34. [212]
    D. Hubel and T. Wiesel. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 124(3), pp. 574–591, 1959.CrossRefGoogle Scholar
  35. [232]
    H. Kandel, J. Schwartz, T. Jessell, S. Siegelbaum, and A. Hudspeth. Principles of neural science. McGraw Hill, 2012.Google Scholar
  36. [236]
    A. Karpathy, J. Johnson, and L. Fei-Fei. Stanford University Class CS321n: Convolutional neural networks for visual recognition, 2016.
  37. [237]
    H. J. Kelley. Gradient theory of optimal flight paths. Ars Journal, 30(10), pp. 947–954, 1960.CrossRefGoogle Scholar
  38. [239]
    T. Kietzmann, P. McClure, and N. Kriegeskorte. Deep Neural Networks In Computational Neuroscience. bioRxiv, 133504, 2017.
  39. [245]
    J. Kivinen and M. Warmuth. The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Computational Learning Theory, pp. 289–296, 1995.Google Scholar
  40. [251]
    D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009.Google Scholar
  41. [255]
    A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.Google Scholar
  42. [262]
    H. Larochelle. Neural Networks (Course). Universite de Sherbrooke, 2013.
  43. [267]
    H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. ICML Confererence, pp. 473–480, 2007.Google Scholar
  44. [277]
    Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553), pp. 436–444, 2015.CrossRefGoogle Scholar
  45. [279]
    Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp. 2278–2324, 1998.CrossRefGoogle Scholar
  46. [281]
    Y. LeCun, C. Cortes, and C. Burges. The MNIST database of handwritten digits, 1998.
  47. [312]
    C. Manning and R. Socher. CS224N: Natural language processing with deep learning. Stanford University School of Engineering, 2017.
  48. [321]
    W. S. McCulloch and W. H. Pitts. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), pp. 115–133, 1943.MathSciNetCrossRefGoogle Scholar
  49. [329]
    G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), pp. 235–312, 1990. CrossRefGoogle Scholar
  50. [330]
    M. Minsky and S. Papert. Perceptrons. An Introduction to Computational Geometry, MIT Press, 1969.Google Scholar
  51. [340]
    G. Montufar. Universal approximation depth and errors of narrow belief networks with discrete units. Neural Computation, 26(7), pp. 1386–1407, 2014.MathSciNetCrossRefGoogle Scholar
  52. [345]
    R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley. Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, pp. 1–11, 2017.Google Scholar
  53. [383]
    H. Poon and P. Domingos. Sum-product networks: A new deep architecture. Computer Vision Workshops (ICCV Workshops), pp. 689–690, 2011.Google Scholar
  54. [402]
    V. Romanuke. Parallel Computing Center (Khmelnitskiy, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate. Retrieved 24 November 2016.Google Scholar
  55. [405]
    F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386, 1958.Google Scholar
  56. [408]
    D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323 (6088), pp. 533–536, 1986.CrossRefGoogle Scholar
  57. [409]
    D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back-propagating errors. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 318–362, 1986.Google Scholar
  58. [431]
    J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61, pp. 85–117, 2015.CrossRefGoogle Scholar
  59. [444]
    H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp. 132–150, 1995.MathSciNetCrossRefGoogle Scholar
  60. [448]
    S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming, 127(1), pp. 3–30, 2011.MathSciNetCrossRefGoogle Scholar
  61. [451]
    B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.Google Scholar
  62. [515]
    S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SIAM Conference on Data Mining, 2017.Google Scholar
  63. [523]
    A. Wendemuth. Learning the unlearnable. Journal of Physics A: Math. Gen., 28, pp. 5423–5436, 1995.MathSciNetCrossRefGoogle Scholar
  64. [524]
    P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974.Google Scholar
  65. [525]
    P. Werbos. The roots of backpropagation: from ordered derivatives to neural networks and political forecasting (Vol. 1). John Wiley and Sons, 1994.Google Scholar
  66. [526]
    P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560, 1990.CrossRefGoogle Scholar
  67. [528]
    J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.Google Scholar
  68. [531]
    B. Widrow and M. Hoff. Adaptive switching circuits. IRE WESCON Convention Record, 4(1), pp. 96–104, 1960.Google Scholar
  69. [571]
  70. [572]
  71. [573]
  72. [574]
  73. [575]
  74. [576]
  75. [581]
  76. [582]
  77. [590]
  78. [594]
  79. [598]
  80. [599]
  81. [600]
  82. [601]
  83. [619]

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterInternational Business MachinesYorktown HeightsUSA

Personalised recommendations