C. Aggarwal. Data classification: Algorithms and applications, CRC Press, 2014.
C. Aggarwal. Data mining: The textbook. Springer, 2015.
C. Aggarwal. Machine learning for text. Springer, 2018.
Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), pp. 1–127, 2009.
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI, 35(8), pp. 1798–1828, 2013.
Y. Bengio and O. Delalleau. On the expressive power of deep architectures. Algorithmic Learning Theory, pp. 18–36, 2011.
J. Bergstra et al. Theano: A CPU and GPU math compiler in Python. Python in Science Conference, 2010.
C. M. Bishop. Pattern recognition and machine learning. Springer, 2007.
C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.
L. Breiman. Random forests. Journal Machine Learning archive, 45(1), pp. 5–32, 2001.
A. Bryson. A gradient method for optimizing multi-stage allocation processes. Harvard University Symposium on Digital Computers and their Applications, 1961.
D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber. Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), pp. 3207–3220, 2010.
T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp. 326–334, 1965.
N. de Freitas. Machine Learning, University of Oxford (Course Video), 2013.https://www.youtube.com/watch?v=w2OtwL5T1ow&list=PLE6Wd9FREdyJ5lbFl8Uu-GjecvVw66F6
N. de Freitas. Deep Learning, University of Oxford (Course Video), 2015.https://www.youtube.com/watch?v=PlhFWT7vAEw&list=PLjK8ddCbDMphIMSXn-1IjyYpHU3DaUYw
O. Delalleau and Y. Bengio. Shallow vs. deep sum-product networks. NIPS Conference, pp. 666–674, 2011.
Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3), pp. 277–296, 1999.
K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), pp. 193–202, 1980.
S. Gallant. Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2), pp. 179–191, 1990.
A. Ghodsi. STAT 946: Topics in Probability and Statistics: Deep Learning, University of Waterloo, Fall 2015. https://www.youtube.com/watch?v=fyAZszlPphs&list=PLehuLRPyt1Hyi78UOkMP-WCGRxGcA9NVOE
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.
A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013.
A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv:1410.5401, 2014.https://arxiv.org/abs/1410.5401
K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016.https://arxiv.org/abs/1612.07771
D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick. Neuroscience-inspired artificial intelligence. Neuron, 95(2), pp. 245–258, 2017.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
S. Haykin. Neural networks and learning machines. Pearson, 2008.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313, (5766), pp. 504–507, 2006.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.
J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. National Academy of Sciences of the USA, 79(8), pp. 2554–2558, 1982.
K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), pp. 359–366, 1989.
D. Hubel and T. Wiesel. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 124(3), pp. 574–591, 1959.
H. Kandel, J. Schwartz, T. Jessell, S. Siegelbaum, and A. Hudspeth. Principles of neural science. McGraw Hill, 2012.
A. Karpathy, J. Johnson, and L. Fei-Fei. Stanford University Class CS321n: Convolutional neural networks for visual recognition, 2016.http://cs231n.github.io/
H. J. Kelley. Gradient theory of optimal flight paths. Ars Journal, 30(10), pp. 947–954, 1960.
T. Kietzmann, P. McClure, and N. Kriegeskorte. Deep Neural Networks In Computational Neuroscience. bioRxiv, 133504, 2017.https://www.biorxiv.org/content/early/2017/05/04/133504
J. Kivinen and M. Warmuth. The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Computational Learning Theory, pp. 289–296, 1995.
D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009.
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.
H. Larochelle. Neural Networks (Course). Universite de Sherbrooke, 2013.https://www.youtube.com/watch?v=SGZ6BttHMPw&list=PL6Xpj9I5qXYEcOhn7-TqghAJ6NAPrNmUBH
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. ICML Confererence, pp. 473–480, 2007.
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553), pp. 436–444, 2015.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp. 2278–2324, 1998.
Y. LeCun, C. Cortes, and C. Burges. The MNIST database of handwritten digits, 1998.http://yann.lecun.com/exdb/mnist/
C. Manning and R. Socher. CS224N: Natural language processing with deep learning. Stanford University School of Engineering, 2017. https://www.youtube.com/watch?v=OQQ-W_63UgQ
W. S. McCulloch and W. H. Pitts. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), pp. 115–133, 1943.
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), pp. 235–312, 1990.https://wordnet.princeton.edu/
M. Minsky and S. Papert. Perceptrons. An Introduction to Computational Geometry, MIT Press, 1969.
G. Montufar. Universal approximation depth and errors of narrow belief networks with discrete units. Neural Computation, 26(7), pp. 1386–1407, 2014.
R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley. Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, pp. 1–11, 2017.
H. Poon and P. Domingos. Sum-product networks: A new deep architecture. Computer Vision Workshops (ICCV Workshops), pp. 689–690, 2011.
V. Romanuke. Parallel Computing Center (Khmelnitskiy, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate. Retrieved 24 November 2016.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386, 1958.
D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323 (6088), pp. 533–536, 1986.
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back-propagating errors. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 318–362, 1986.
J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61, pp. 85–117, 2015.
H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp. 132–150, 1995.
S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming, 127(1), pp. 3–30, 2011.
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SIAM Conference on Data Mining, 2017.
A. Wendemuth. Learning the unlearnable. Journal of Physics A: Math. Gen., 28, pp. 5423–5436, 1995.
P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974.
P. Werbos. The roots of backpropagation: from ordered derivatives to neural networks and political forecasting (Vol. 1). John Wiley and Sons, 1994.
P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560, 1990.
J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.
B. Widrow and M. Hoff. Adaptive switching circuits. IRE WESCON Convention Record, 4(1), pp. 96–104, 1960.