Advertisement

Convolutional Neural Networks

  • Charu C. Aggarwal
Chapter

Abstract

Convolutional neural networks are designed to work with grid-structured inputs, which have strong spatial dependencies in local regions of the grid. The most obvious example of grid-structured data is a 2-dimensional image. This type of data also exhibits spatial dependencies, because adjacent spatial locations in an image often have similar color values of the individual pixels. An additional dimension captures the different colors, which creates a 3-dimensional input volume. Therefore, the features in a convolutional neural network have dependencies among one another based on spatial distances.

Bibliography

  1. [16]
    A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. arXiv:1404.1777, 2014.https://arxiv.org/abs/1404.1777
  2. [17]
    M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt. Sequential deep learning for human action recognition. International Workshop on Human Behavior Understanding, pp. 29–39, 2011.Google Scholar
  3. [21]
    N. Ballas, L. Yao, C. Pal, and A. Courville. Delving deeper into convolutional networks for learning video representations. arXiv:1511.06432, 2015.https://arxiv.org/abs/1511.06432
  4. [53]
    T. Brox and J. Malik. Large displacement optical flow: descriptor matching in variational motion estimation. IEEE TPAMI, 33(3), pp. 500–513, 2011.CrossRefGoogle Scholar
  5. [76]
    A. Coates and A. Ng. Learning feature representations with k-means. Neural networks: Tricks of the Trade, Springer, pp. 561–580, 2012.Google Scholar
  6. [78]
    R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, pp. 2493–2537, 2011.zbMATHGoogle Scholar
  7. [79]
    R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML Conference, pp. 160–167, 2008.Google Scholar
  8. [85]
    D. Cox and N. Pinto. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, pp. 8–15, 2011.Google Scholar
  9. [87]
    N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, pp. 886–893, 2005.Google Scholar
  10. [92]
    M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. NIPS Conference, pp. 3844–3852, 2016.Google Scholar
  11. [100]
    J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. IEEE conference on computer vision and pattern recognition, pp. 2625–2634, 2015.Google Scholar
  12. [102]
    C. Dos Santos and M. Gatti. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. COLING, pp. 69–78, 2014.Google Scholar
  13. [104]
    A. Dosovitskiy and T. Brox. Inverting visual representations with convolutional networks. CVPR Conference, pp. 4829–4837, 2016.Google Scholar
  14. [109]
    V. Dumoulin and F. Visin. A guide to convolution arithmetic for deep learning. arXiv:1603.07285, 2016.https://arxiv.org/abs/1603.07285
  15. [117]
    P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), pp. 1627–1645, 2010.CrossRefGoogle Scholar
  16. [127]
    K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), pp. 193–202, 1980.CrossRefGoogle Scholar
  17. [130]
    H. Gao, H. Yuan, Z. Wang, and S. Ji. Pixel Deconvolutional Networks. arXiv:1705.06820, 2017.https://arxiv.org/abs/1705.06820
  18. [131]
    L. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using convolutional neural networks. NIPS Conference, pp. 262–270, 2015.Google Scholar
  19. [132]
    L. Gatys, A. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2015.Google Scholar
  20. [161]
    K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016.https://arxiv.org/abs/1612.07771
  21. [163]
    R. Girshick, F. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 437–446, 2015.Google Scholar
  22. [172]
    B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. arXiv:1407.1808, 2014.https://arxiv.org/abs/1407.1808
  23. [176]
    D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick. Neuroscience-inspired artificial intelligence. Neuron, 95(2), pp. 245–258, 2017.CrossRefGoogle Scholar
  24. [180]
    M. Havaei et al. Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35, pp. 18–31, 2017.CrossRefGoogle Scholar
  25. [184]
    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.Google Scholar
  26. [185]
    K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. European Conference on Computer Vision, pp. 630–645, 2016.Google Scholar
  27. [188]
    M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional networks on graph-structured data. arXiv:1506.05163, 2015.https://arxiv.org/abs/1506.05163
  28. [210]
    G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. European Conference on Computer Vision, pp. 646–661, 2016.Google Scholar
  29. [211]
    G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten. Densely connected convolutional networks. arXiv:1608.06993, 2016.https://arxiv.org/abs/1608.06993
  30. [212]
    D. Hubel and T. Wiesel. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 124(3), pp. 574–591, 1959.CrossRefGoogle Scholar
  31. [221]
    K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? International Conference on Computer Vision (ICCV), 2009.Google Scholar
  32. [222]
    S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE TPAMI, 35(1), pp. 221–231, 2013.CrossRefGoogle Scholar
  33. [223]
    Y. Jia et al. Caffe: Convolutional architecture for fast feature embedding. ACM International Conference on Multimedia, 2014.Google Scholar
  34. [225]
    J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574, 2015.Google Scholar
  35. [226]
    J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. European Conference on Computer Vision, pp. 694–711, 2015.Google Scholar
  36. [227]
    R. Johnson and T. Zhang. Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058, 2014.https://arxiv.org/abs/1412.1058
  37. [234]
    A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 725–1732, 2014.Google Scholar
  38. [236]
    A. Karpathy, J. Johnson, and L. Fei-Fei. Stanford University Class CS321n: Convolutional neural networks for visual recognition, 2016.http://cs231n.github.io/
  39. [240]
    Y. Kim. Convolutional neural networks for sentence classification. arXiv:1408.5882, 2014.Google Scholar
  40. [243]
    T. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016.https://arxiv.org/pdf/1609.02907.pdf
  41. [255]
    A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.Google Scholar
  42. [260]
    S. Lai, L. Xu, K. Liu, and J. Zhao. Recurrent Convolutional Neural Networks for Text Classification. AAAI, pp. 2267–2273, 2015.Google Scholar
  43. [268]
    G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv:1605.07648, 2016.https://arxiv.org/abs/1605.07648
  44. [269]
    S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back. Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks, 8(1), pp. 98–113, 1997.CrossRefGoogle Scholar
  45. [270]
    Q. Le et al. Building high-level features using large scale unsupervised learning. ICASSP, 2013.Google Scholar
  46. [276]
    Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, 3361(10), 1995.Google Scholar
  47. [279]
    Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp. 2278–2324, 1998.CrossRefGoogle Scholar
  48. [281]
    Y. LeCun, C. Cortes, and C. Burges. The MNIST database of handwritten digits, 1998.http://yann.lecun.com/exdb/mnist/
  49. [283]
    Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. IEEE International Symposium on Circuits and Systems, pp. 253–256, 2010.Google Scholar
  50. [285]
    H. Lee, R. Grosse, B. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML Conference, pp. 609–616, 2009.Google Scholar
  51. [297]
    M. Lin, Q. Chen, and S. Yan. Network in network. arXiv:1312.4400, 2013.https://arxiv.org/abs/1312.4400
  52. [308]
    A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196, 2015.Google Scholar
  53. [310]
    A. Makhzani and B. Frey. Winner-take-all autoencoders. NIPS Conference, pp. 2791–2799, 2015.Google Scholar
  54. [318]
    J. Masci, U. Meier, D. Ciresan, and J. Schmidhuber. Stacked convolutional auto-encoders for hierarchical feature extraction. Artificial Neural Networks and Machine Learning, pp. 52–59, 2011.Google Scholar
  55. [325]
    T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.https://arxiv.org/abs/1301.3781
  56. [356]
    J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702, 2015.Google Scholar
  57. [358]
    A. Nguyen, A. Dosovitskiy, J. Yosinski, T., Brox, and J. Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. NIPS Conference, pp. 3387–3395, 2016.Google Scholar
  58. [361]
    M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724, 2014.Google Scholar
  59. [367]
    O. Parkhi, A. Vedaldi, and A. Zisserman. Deep Face Recognition. BMVC, 1(3), pp. 6, 2015.Google Scholar
  60. [371]
    J. Pennington, R. Socher, and C. Manning. Glove: Global Vectors for Word Representation. EMNLP, pp. 1532–1543, 2014.Google Scholar
  61. [384]
    A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015.https://arxiv.org/abs/1511.06434
  62. [387]
    M.’ A. Ranzato, F. J. Huang, Y-L. Boureau, and Y. LeCun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. Computer Vision and Pattern Recognition, pp. 1–8, 2007.Google Scholar
  63. [390]
    A. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: an astounding baseline for recognition. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813, 2014.Google Scholar
  64. [391]
    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.Google Scholar
  65. [407]
    H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE TPAMI, 20(1), pp. 23–38, 1998.CrossRefGoogle Scholar
  66. [425]
    A. Saxe, P. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Ng. On random weights and unsupervised feature learning. ICML Confererence, pp. 1089–1096, 2011.Google Scholar
  67. [441]
    P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013.https://arxiv.org/abs/1312.6229
  68. [449]
    E. Shelhamer, J., Long, and T. Darrell. Fully convolutional networks for semantic segmentation. IEEE TPAMI, 39(4), pp. 640–651, 2017.Google Scholar
  69. [452]
    P. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural networks applied to visual document analysis. ICDAR, pp. 958–962, 2003.Google Scholar
  70. [454]
    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.https://arxiv.org/abs/1409.1556
  71. [455]
    K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. NIPS Conference, pp. 568–584, 2014.Google Scholar
  72. [456]
    K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034, 2013.Google Scholar
  73. [466]
    J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arXiv:1412.6806, 2014.https://arxiv.org/abs/1412.6806
  74. [474]
    Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv:1502.00873, 2013. https://arxiv.org/abs/1502.00873
  75. [475]
    Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898, 2014.Google Scholar
  76. [485]
    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9, 2015.Google Scholar
  77. [486]
    C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016.Google Scholar
  78. [487]
    C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI Conference, pp. 4278–4284, 2017.Google Scholar
  79. [488]
    G. Taylor, R. Fergus, Y. LeCun, and C. Bregler. Convolutional learning of spatio-temporal features. European Conference on Computer Vision, pp. 140–153, 2010.Google Scholar
  80. [500]
    D. Tran et al. Learning spatiotemporal features with 3d convolutional networks. IEEE International Conference on Computer Vision, 2015.Google Scholar
  81. [501]
    R. Uijlings, A. van de Sande, T. Gevers, and M. Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2), 2013.Google Scholar
  82. [503]
    A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. ACM International Conference on Multimedia, pp. 689–692, 2005.http://www.vlfeat.org/matconvnet/
  83. [505]
    A. Veit, M. Wilber, and S. Belongie. Residual networks behave like ensembles of relatively shallow networks. NIPS Conference, pp. 550–558, 2016.Google Scholar
  84. [509]
    O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR Conference, pp. 3156–3164, 2015.Google Scholar
  85. [514]
    L. Wang, Y. Qiao, and X. Tang. Action recognition with trajectory-pooled deep-convolutional descriptors. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314, 2015.Google Scholar
  86. [517]
    T. Wang, D. Wu, A. Coates, and A. Ng. End-to-end text recognition with convolutional neural networks. International Conference on Pattern Recognition, pp. 3304–3308, 2012.Google Scholar
  87. [537]
    S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. arXiv:1611.05431, 2016.https://arxiv.org/abs/1611.05431
  88. [544]
    F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122, 2015.https://arxiv.org/abs/1511.07122
  89. [549]
    S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv:1605.07146, 2016.https://arxiv.org/abs/1605.07146
  90. [554]
    M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus. Deconvolutional networks. Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535, 2010.Google Scholar
  91. [555]
    M. Zeiler, G. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning. IEEE International Conference on Computer Vision (ICCV)—, pp. 2018–2025, 2011.Google Scholar
  92. [556]
    M. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. European Conference on Computer Vision, Springer, pp. 818–833, 2013.Google Scholar
  93. [561]
    X. Zhang, J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. NIPS Conference, pp. 649–657, 2015.Google Scholar
  94. [568]
    C. Zitnick and P. Dollar. Edge Boxes: Locating object proposals from edges. ECCV, pp. 391–405, 2014.Google Scholar
  95. [571]
  96. [572]
  97. [573]
  98. [574]
  99. [575]
  100. [576]
  101. [581]
  102. [582]
  103. [583]
  104. [584]
  105. [585]
  106. [586]
  107. [642]

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Charu C. Aggarwal
    • 1
  1. 1.IBM T. J. Watson Research CenterInternational Business MachinesYorktown HeightsUSA

Personalised recommendations