Beyond ImageNet: Deep Learning in Industrial Practice

  • Thilo StadelmannEmail author
  • Vasily Tolkachev
  • Beate Sick
  • Jan Stampfli
  • Oliver Dürr


Deep learning (DL) methods have gained considerable attention since 2014. In this chapter we briefly review the state of the art in DL and then give several examples of applications from diverse areas of application. We will focus on convolutional neural networks (CNNs), which have since the seminal work of Krizhevsky et al. (ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, pp. 1097–1105, 2012) revolutionized image classification and even started surpassing human performance on some benchmark data sets (Ciresan et al., Multi-column deep neural network for traffic sign classification, 2012a; He et al., Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. CoRR, Vol. 1502.01852, 2015a). While deep neural networks have become popular primarily for image classification tasks, they can also be successfully applied to other areas and problems with some local structure in the data. We will first present a classical application of CNNs on image-like data, in particular, phenotype classification of cells based on their morphology, and then extend the task to clustering voices based on their spectrograms. Next, we will describe DL applications to semantic segmentation of newspaper pages into their corresponding articles based on clues in the pixels, and outlier detection in a predictive maintenance setting. We conclude by giving advice on how to work with DL having limited resources (e.g., training data).


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors are grateful for the support by CTI grants 17719.1 PFES-ES, 17729.1 PFES-ES, and 19139.1 PFES-ES.


  1. Aucouturier, J.-J., Defreville, B., & Pachet, F. (2007). The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America, 122(2), 881–891.CrossRefGoogle Scholar
  2. Beigi, H. (2011). Fundamentals of speaker recognition. Springer Science & Business Media.Google Scholar
  3. Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate statistical process control charts: An overview. Quality and Reliability Engineering International, 23, 517–543.CrossRefGoogle Scholar
  4. Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.Google Scholar
  5. Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In: From form to meaning: Processing texts automatically, Proceedings of the Biennial GSCL Conference 2009 (pp. 31–40).
  6. Chung, J. S., Senior, A. W., Vinyals, O., & Zisserman, A. (2016). Lip reading sentences in the wild. CoRR, Vol. 1611.05358.
  7. Ciresan, D., Meier, U., Masci, J., & Schmidhuber, J. (2012a). Multi-column deep neural network for traffic sign classification.
  8. Ciresan, D., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2012b). Deep neural networks segment neuronal membranes in electron microscopy images. Advances in Neural Information Processing Systems, 25, 2843–2851.Google Scholar
  9. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRefGoogle Scholar
  10. Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. In Proceedings of ICASSP (pp. 6964–6968).Google Scholar
  11. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.CrossRefGoogle Scholar
  12. Dürr, O., Duval, F., Nichols, A., Lang, P., Brodte, A., Heyse, S., & Besson, D. (2007). Robust hit identification by quality assurance and multivariate data analysis of a high-content, cell-based assay. Journal of Biomolecular Screening, 12(8), 1042–1049.CrossRefGoogle Scholar
  13. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231) AAAI Press.Google Scholar
  14. Fernández-Francos, D., Martínez-Rego, D., Fontenla-Romero, O., & Alonso-Betanzos, A. (2013). Automatic bearing fault diagnosis based on one-class ν-SVM. Computers & Industrial Engineering, 64(1), 357–365.CrossRefGoogle Scholar
  15. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.CrossRefGoogle Scholar
  16. Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of SPECOM 2005 (Vol. 1, pp. 191–194).Google Scholar
  17. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
  18. Gustafsdottir, S. M., Ljosa, V., Sokolnicki, K. L., Wilson, J. A., Walpita, D., Kemp, M. M., Petri Seiler, K., Carrel, H. A., Golub, T. R., Schreiber, S. L., Clemons, P. A., Carpenter, A. E., & Shamji, A. F. (2013). Multiplex cytological profiling assay to measure diverse cellular states. PLoS One, 12, e80999.CrossRefGoogle Scholar
  19. Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. CoRR, Vol. 1510.00149.
  20. He, K., Zhang, X., Ren, S., & Sun, J. (2015a). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. CoRR, Vol. 1502.01852.
  21. He, K., Zhang, X., Ren, S., & Sun, J. (2015b). Deep residual learning for image recognition. CoRR, Vol. 1512.03385.
  22. Hinton, G. E., Srivastava, N., & Swersky, K. (2012). Lecture 6a: Overview of mini-batch gradient descent. In Neural Networks for Machine Learning, University of Toronto.
  23. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  24. Hsu, Y.-C., & Kira, Z. (2015). Neural network-based clustering using pairwise constraints. CoRR, Vol. 1511.06321.
  25. Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.CrossRefGoogle Scholar
  26. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (Vol. 37, pp. 448–456). Scholar
  27. Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR, Vol. 1412.6980.
  28. Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. CoRR, Vol. 1312.6114.
  29. Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M. (2014). Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems (pp. 3581–3589).
  30. Kotti, M., Moschou, V., & Kotropoulos, C. (2008). Speaker segmentation and clustering. Signal Processing, 88(5), 1091–1124.CrossRefGoogle Scholar
  31. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.Google Scholar
  32. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
  33. LeCun, Y., Bottou, L., Orr, G. B., & Mueller, K.-R. (1998b). Efficient BackProp. In G. B. Orr, & K.-R. Mueller (Eds.), Neural networks: Tricks of the trade, Lecture Notes in Computer Science (Vol. 1524, pp. 9–50).Google Scholar
  34. LeCun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521, 436–444.CrossRefGoogle Scholar
  35. Lee, J., Qiu, H., Yu, G., & Lin, J. (2007). Bearing data set. IMS, University of Cincinnati, NASA Ames Prognostics Data Repository, Rexnord Technical Services.
  36. Ljosa, V., Sokolnicki, K. L., & Carpenter, A. E. (2009). Annotated high-throughput microscopy image sets for validation. Nature Methods, 9, 637.CrossRefGoogle Scholar
  37. Long, J., Shelhamer, E., & Darrell, T. (2014). Fully convolutional networks for semantic segmentation.
  38. Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. In Proceedings of IEEE MLSP 2016.Google Scholar
  39. Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2017). Learning embeddings for speaker clustering based on voice quality. In Proceedings of IEEE MLSP 2017.Google Scholar
  40. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.Google Scholar
  41. Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., & Cieliebak, M. (2017). Fully convolutional neural networks for newspaper article segmentation. In Proceedings of ICDAR 2017.Google Scholar
  42. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).
  43. Mitchell, T. M. (1980). The need for biases in learning generalizations. Technical Report, Rutgers University, New Brunswick, NJ.
  44. Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., & Bowling, M. H. (2017). DeepStack: Expert-level artificial intelligence in no-limit poker, CoRR, Vol. 1701.01724.
  45. Mori, S., Nishida, H., & Yamada, H. (1999). Optical character recognition. New York, NY: Wiley. ISBN 0471308196.Google Scholar
  46. Ng, A. (2016). Nuts and bolts of building AI applications using deep learning. NIPS Tutorial.Google Scholar
  47. Ng, A. (2019, in press). Machine learning yearning.
  48. Nielsen, M. A. (2015). Neural networks and deep learning. Determination Press.
  49. Nielsen, F. A. (2017). Status on human vs. machines, post on “Finn Årup Nielsen’s blog”.
  50. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. Scholar
  51. Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 215–249.Google Scholar
  52. Randall, R. B., & Antoni, J. (2011). Rolling element bearing diagnostics—A tutorial. Mechanical Systems and Signal Processing, 25(2), 485–520.CrossRefGoogle Scholar
  53. Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. CVPR 2014 (pp. 806–813).
  54. Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRefGoogle Scholar
  55. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41.CrossRefGoogle Scholar
  56. Romanov, A., & Rumshisky, A. (2017). Forced to learn: Discovering disentangled representations without exhaustive labels. ICRL 2017.
  57. Rosenblatt, F. (1957). The perceptron – A perceiving and recognizing automaton. Technical report 85-460-1, Cornell Aeronautical Laboratory. Google Scholar
  58. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. In Neurocomputing: Foundations of Research (pp. 696–699). MIT Press.
  59. Schmidhuber, J. (2014). Deep learning in neural networks: An overview.
  60. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.Google Scholar
  61. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–503.CrossRefGoogle Scholar
  62. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, vol. 1409.1556.
  63. Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.MathSciNetzbMATHGoogle Scholar
  64. Stadelmann, T., & Freisleben, B. (2009). Unfolding speaker clustering potential: A biomimetic approach. In Proceedings of the 17th ACM International Conference on Multimedia (pp. 185–194). ACM.Google Scholar
  65. Stadelmann, T., Musy, T., Duerr, O., & Eyyi, G. (2016). Machine learning-style experimental evaluation of classic condition monitoring approaches on CWRU data. Technical report, ZHAW Datalab (unpublished).Google Scholar
  66. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going deeper with convolutions. CoRR, Vol. 1409.4842.
  67. Szeliski, R. (2010). Computer vision: Algorithms and applications. Texts in Computer Science. New York: Springer. Scholar
  68. van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.Google Scholar
  69. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.MathSciNetzbMATHGoogle Scholar
  70. Weyand, T., Kostrikov I., & Philbin, J. (2016). PlaNet – Photo geolocation with convolutional neural networks. CoRR, Vol. 1602.05314.
  71. Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., & Zweig, G. (2016). Achieving human parity in conversational speech recognition. CoRR, Vol. 1610.05256.
  72. Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Wilson, R. C., Hancock, E. R., & Smith, W. A. P. (Eds.), Proceedings of the British Machine Vision Conference (BMVC) (pp. 87.1–87. 12. BMVA Press.Google Scholar
  73. Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. CoRR, Vol. 1212.5701.
  74. Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Thilo Stadelmann
    • 1
    Email author
  • Vasily Tolkachev
    • 2
  • Beate Sick
    • 1
  • Jan Stampfli
    • 1
  • Oliver Dürr
    • 3
  1. 1.ZHAW Zurich University of Applied SciencesWinterthurSwitzerland
  2. 2.University of BernBernSwitzerland
  3. 3.HTWG Konstanz - University of Applied SciencesKonstanzGermany

Personalised recommendations