Advertisement

Deep Learning Analytics

  • Nikolaos PassalisEmail author
  • Anastasios Tefas
Chapter
Part of the Intelligent Systems Reference Library book series (ISRL, volume 149)

Abstract

The recent breakthroughs in Deep Learning have provided powerful data analytics tools for a wide range of domains ranging from advertising and analyzing users’ behavior to load and financial forecasting. Depending on the nature of the available data and the task at hand Deep Learning Analytics techniques can be divided into two broad categories: (a) unsupervised learning techniques and (b) supervised learning techniques. In this chapter we provide an extensive overview over both categories. Unsupervised learning methods, such as Autoencoders, are able to discover and extract the information from the data without using any ground truth information and/or supervision from domain experts. Thus, unsupervised techniques can be especially useful for data exploration tasks, especially when combined with advanced visualization techniques. On the other hand, supervised learning techniques are used when ground truth information is available and we want to build classification and/or forecasting models. Several deep learning models are examined ranging from simple Multilayer Perceptrons (MLPs) to Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). However training deep learning models is not always a straightforward task requiring both a solid theoretical background as well as intuition and experience. To this end, we also present recent techniques that allow for efficiently training deep learning models, such as batch normalization, residual connections, advanced optimization techniques and activation functions, as well as a number of useful practical suggestions. Finally, we present an overview of the available open source deep learning frameworks that can be used to implement deep learning analytics techniques and accelerate the training process using Graphics Processing Units (GPUs).

References

  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). www.tensorflow.org
  2. 2.
    Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263 (2015)Google Scholar
  4. 4.
    Aggarwal, C.C, Reddy, C.K.: Data Clustering: Algorithms and Applications. CRC press (2013)Google Scholar
  5. 5.
    Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: Application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)CrossRefGoogle Scholar
  6. 6.
    Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRefGoogle Scholar
  7. 7.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 153–160 (2007)Google Scholar
  8. 8.
    Celebi, M.E., Aydin, K.: Unsupervised Learning Algorithms. Springer (2016)Google Scholar
  9. 9.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets (2014). arXiv:1405.3531
  10. 10.
    Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder–decoder approaches. Syntax Semant. Struct. Stat. Transl. p. 103 (2014)Google Scholar
  11. 11.
    Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
  12. 12.
    Choromanska, A., Henaff, M., Mathieu, M., Ben Arous, G., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015)Google Scholar
  13. 13.
    Christopher, D.M, Prabhakar, R., Hinrich, S.: Introduction to information retrieval. In: An Introduction to Information Retrieval, vol. 151, p. 177 (2008)Google Scholar
  14. 14.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)Google Scholar
  15. 15.
    Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. Proc. IEEE/ACM Trans. Audio Speech Lang. Process. 23(9), 1469–1477 (2015)CrossRefGoogle Scholar
  16. 16.
    De Oliveira, M.C.F., Levkowitz, H.: From visual data exploration to visual data mining: a survey. IEEE Trans. Vis. Comput. Gr. 9(3), 378–394 (2003)CrossRefGoogle Scholar
  17. 17.
    Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp. 69–78 (2014)Google Scholar
  18. 18.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)Google Scholar
  19. 19.
    Elkahky, A.M., Song, Y., He, X.: A multi-view deep learning approach for cross domain user modeling in recommendation systems. In: Proceedings of the International Conference on World Wide Web, pp. 278–288 (2015)Google Scholar
  20. 20.
    Gers, F.A., Eck, D., Schmidhuber, J.: Applying LSTM to time series predictable through time-window approaches. In: Proceedings of the Italian Workshop on Neural Nets, pp. 193–200 (2002)Google Scholar
  21. 21.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)CrossRefGoogle Scholar
  22. 22.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  23. 23.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  24. 24.
    Guyon, I., Elisseeff, A.: An introduction to feature extraction. Feature Extr. 1–25 (2006)Google Scholar
  25. 25.
    Haykin, S.S., Haykin, S.S., Haykin, S.S., Haykin, S.S.: Neural Networks and Learning Machines, vol. 3. Pearson Upper Saddle River (2009)Google Scholar
  26. 26.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)Google Scholar
  27. 27.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  28. 28.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  29. 29.
    Hershey, J.R., Chen, Z., Roux, J.L., Watanabe, S.: Deep clustering: discriminative embeddings for segmentation and separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 31–35 (2016)Google Scholar
  30. 30.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Hosseini-Asl, E., Zurada, J.M., Nasraoui, O.: Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans. Neural Netw. Learn. Syst. 27(12), 2486–2498 (2016)CrossRefGoogle Scholar
  32. 32.
    Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, k.Q.: Deep networks with stochastic depth. In: Proceedings of the European Conference on Computer Vision, pp. 646–661 (2016)CrossRefGoogle Scholar
  33. 33.
    Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. The J. Physiol. 195(1), 215–243 (1968)CrossRefGoogle Scholar
  34. 34.
    Huffman, W.C., Pless, V.: Fundamentals of Error-Correcting Codes. Cambridge university press, 2010Google Scholar
  35. 35.
    Ioffe, S., Szegedy, C.: Batch normalization: adeep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  36. 36.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv:1408.5093
  37. 37.
    Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the International Conference on Machine Learning, pp. 2342–2350 (2015)Google Scholar
  38. 38.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRefGoogle Scholar
  39. 39.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)Google Scholar
  40. 40.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013). arXiv:1312.6114
  41. 41.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  42. 42.
    Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: Proceedings of the International Conference on Machine Learning, pp. 1985–1994 (2017)Google Scholar
  43. 43.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  44. 44.
    Lecun, Y., Cortes, C.: The MNIST database of handwritten digitsGoogle Scholar
  45. 45.
    Lowe, D.G.: Object recognition from local scale-invariant features. Proceedings of the IEEE International Conference on Computer Vision 2, 1150–1157 (1999)CrossRefGoogle Scholar
  46. 46.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)Google Scholar
  47. 47.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)Google Scholar
  48. 48.
    Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120(3), 233–255 (2016)MathSciNetCrossRefGoogle Scholar
  49. 49.
    Makhzani, A., Frey, B.: K-sparse autoencoders (2013). arXiv:1312.5663
  50. 50.
    Microsoft. Microsoft cognitive toolkit CNTK (2015). https://github.com/Microsoft/CNTK
  51. 51.
    Mishkin, D., Matas, J.: All you need is a good init (2015). arXiv:1511.06422
  52. 52.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  53. 53.
    Nousi, P., Tefas, A.: Deep learning algorithms for discriminant autoencoding. Neurocomputing 266, 325–335 (2017)CrossRefGoogle Scholar
  54. 54.
    Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the International Conference on Machine Learning, pp. 1310–1318 (2013)Google Scholar
  55. 55.
    Passalis, N., Tefas, A.: Bag of embedded words learning for text retrieval. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 2416–2421 (2016)Google Scholar
  56. 56.
    Passalis, N., Tefas, A.: Entropy optimized feature-based bag-of-words representation for information retrieval. IEEE Trans. Knowl. Data Eng. 28(7), 1664–1677 (2016)CrossRefGoogle Scholar
  57. 57.
    Passalis, N., Tefas, A.: Spectral clustering using optimized bag-of-features. In: Proceedings of the 9th Hellenic Conference on Artificial Intelligence, p. 19 (2016)Google Scholar
  58. 58.
    Passalis, N., Tefas, A.: Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting. In: Proceedings of the 25th European Signal Processing Conference, pp. 71–75 (2017)Google Scholar
  59. 59.
    Passalis, N., Tefas, A.: Dimensionality reduction using similarity-induced embeddings. IEEE Trans. Neural Netw. Learn. Syst. (to appear) 1–13 (2017)Google Scholar
  60. 60.
    Passalis, N., Tefas, A.: Improving face pose estimation using long-term temporal averaging for stochastic optimization. In: International Conference on Engineering Applications of Neural Networks, pp. 194–204 (2017)CrossRefGoogle Scholar
  61. 61.
    Passalis, N., Tefas, A.: Learning bag-of-features pooling for deep convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, Oct 2017Google Scholar
  62. 62.
    Passalis, N., Tefas, A.: Learning neural bag-of-features for large-scale image retrieval. IEEE Trans. Syst. Man Cybern, Syst (2017)Google Scholar
  63. 63.
    Passalis, N., Tefas, A.: Neural bag-of-features learning. Pattern Recogn. 64, 277–294 (2017)CrossRefGoogle Scholar
  64. 64.
    Passalis, N., Tefas, A.: Information clustering using manifold-based optimization of the bag-of-features representation. IEEE Trans. Cybern. 48(1), 52–63 (2018)CrossRefGoogle Scholar
  65. 65.
    Passalis, N., Tsantekidis, A., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis, A.: Time-series classification using neural bag-of-features. In: Proceedings of the European Signal Processing Conference, pp. 301–305 (2017)Google Scholar
  66. 66.
    Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)Google Scholar
  67. 67.
    PyTorch. Pytorch (2017). https://github.com/pytorch/pytorch
  68. 68.
    Qiu, X., Zhang, L., Ren, Y., Suganthan, P.N., Amaratunga, G.: Ensemble deep learning for regression and time series forecasting. In: IEEE Symposium on Computational Intelligence in Ensemble Learning, pp. 1–6 (2014)Google Scholar
  69. 69.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the International Conference on Machine Learning, pp. 833–840 (2011)Google Scholar
  70. 70.
    Rolfe, J.L., LeCun, Y.: Discriminative recurrent sparse auto-encoders (2013). arXiv:1301.3775
  71. 71.
    Salakhutdinov, R., Hinton, G.: Deep boltzmann machines. In: Proceedings of the Artificial Intelligence and Statistics, pp. 448–455 (2009)Google Scholar
  72. 72.
    Senior, A., Heigold, G., Yang, K., et al.: An empirical study of learning rates in deep neural networks for speech recognition. In: Proceedings of the IEEE International Conference on on Acoustics, Speech and Signal Processing, pp. 6724–6728 (2013)Google Scholar
  73. 73.
    Severyn, A., Moschitti, A.: Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 959–962 (2015)Google Scholar
  74. 74.
    Shen, Y., Huang, P-S., Gao, J., Chen, W.: Reasonet: learning to stop reading in machine comprehension. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1047–1055 (2017)Google Scholar
  75. 75.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
  76. 76.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  77. 77.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  78. 78.
    Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions, May 2016. arXiv:1605.02688
  79. 79.
    Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis, A.: Forecasting stock prices from the limit order book using convolutional neural networks. Proc. IEEE Conf. Bus. Inf. 1, 7–12 (2017)Google Scholar
  80. 80.
    Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., Iosifidis, A.: Using deep learning to detect price change indications in financial markets. In: Proceedings of the European Signal Processing Conference, pp. 2511–2515 (2017)Google Scholar
  81. 81.
    Tzelepi, M., Tefas, A.: Deep convolutional learning for content based image retrieval. Neurocomputing 275, 2467–2478 (2018)CrossRefGoogle Scholar
  82. 82.
    Unal, M., Onat, M., Demetgul, M., Kucuk, H.: Fault diagnosis of rolling bearings using a genetic algorithm optimized neural network. Measurement 58, 187–196 (2014)CrossRefGoogle Scholar
  83. 83.
    Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)Google Scholar
  84. 84.
    Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4534–4542 (2015)Google Scholar
  85. 85.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the International Conference on Machine Learning, pp. 1096–1103 (2008)Google Scholar
  86. 86.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)Google Scholar
  87. 87.
    Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropconnect. In: Proceedings of the International Conference on Machine Learning, pp. 1058–1066 (2013)Google Scholar
  88. 88.
    Wang, S., Jiang, J.: Machine comprehension using match-lstm and answer pointer (2016). arXiv:1608.07905
  89. 89.
    Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)CrossRefGoogle Scholar
  90. 90.
    Dongkuan, X., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)CrossRefGoogle Scholar
  91. 91.
    Yao, Y., Rosasco, L., Caponnetto, A.: On early stopping in gradient descent learning. Constr. Approx. 26(2), 289–315 (2007)MathSciNetCrossRefGoogle Scholar
  92. 92.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). arXiv:1212.5701
  93. 93.
    Zhang, Y., Schneider, J.: Multi-label output codes using canonical correlation analysis. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 873–882 (2011)Google Scholar
  94. 94.
    Zheng, Y., Liu, Q., Chen, E., Ge, Y., Zhao, J.L.: Time series classification using multi-channels deep convolutional neural networks. In: Proceedings of the International Conference on Web-Age Information Management, pp. 298–310 (2014)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Aristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations