Skip to main content
Log in

Sparse coding network model based on fast independent component analysis

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Neurobiological studies have shown that neurons in the primary visual cortex (V1) may employ sparse presentations to represent stimuli. We describe a network model for sparse coding which includes input layer, base functional layer and output layer. We simulated standard sparse coding and sparse coding based on fast independent component analysis (ICA). The duration of training bases, the convergence speed of objective function and the sparsity of coefficient matrix were compared, respectively. The results show that sparse coding based on fast ICA is more effective than standard sparse coding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Treichler DG (1967) Are you missing the boat in training aids. Film Audio Vis Commun 1:14–16

    Google Scholar 

  2. Olshausen BA, Field DJ (1997) Sparse coding with an over complete basis set: a strategy employed by V1. Vis Res 37:3313–3325

    Article  Google Scholar 

  3. Field DJ (1994) What is the goal of sensory coding? Neural Comput 6:559–601

    Article  Google Scholar 

  4. Olshausen BA, Field DJ (1996) Emergence of simple cell receptive properties by learning a sparse code for natural images. Nature 381:607–609

    Article  Google Scholar 

  5. Simoncelli EP (2003) Vision and the statistics of the visual environment. Curr Opin Neurobiol 1(13):144–149

    Article  Google Scholar 

  6. Delgutte B, Hammond B, Cariani P (1998) Psychophysical and physiological advances in hearing. Whurr Publishers, London

    Google Scholar 

  7. Ruderman DL, Bialek W (1994) Statistics of natural images: scaling in the woods. Phys Rev Lett 73(6):814–817

    Article  Google Scholar 

  8. Kandel ER, Schwartz JH, Jessel TM (2000) Principles of neural science, vol 4. McGraw-Hill Medical, New york

    Google Scholar 

  9. Hyvarinen A (1999) Survey on independent component analysis. Neural Comput Surv 2(4):94–128

    Google Scholar 

  10. Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Opin Neurobiol 14:481–487

    Article  Google Scholar 

  11. Lewick M (2002) Efficient coding of natural sounds. Nat Neurosci 5:356–363

    Article  Google Scholar 

  12. Vinje W, Gallant J (2002) Natural stimulation of the non-classical receptive field increases information transmission efficiency in V1. J Neurosci 22:2904–2915

    Article  Google Scholar 

  13. Hubel DH, Wiesel TN (1977) Functional architecture of macaque monkey visual cortex. Proc R Soc Lond B 198:1–59

    Article  Google Scholar 

  14. Hyvarinen A, Hoyer PO (2002) A two-layer sparse coding model learn simple and complex cell receptive fields and topography from natural images. Vis Res 41(18):2413–2423

    Article  Google Scholar 

  15. Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Annu Rev Neurosci 24:1193–1216

    Article  Google Scholar 

  16. Hoyer PO, Hyvarinen A (2000) Independent component analysis applied to feature extraction from colour and stereo images. Netw Comput Neural Syst 11(3):191–210

    Article  MATH  Google Scholar 

  17. Haken H (2007) Towards a unifying model of neural net activity in the visual cortex. Cogn Neurodyn 1(1):15–25

    Article  Google Scholar 

  18. Hyvarinen A, Hoyer PO, Mika OI (2001) Topographic independent component analysis. Neural Comput 13(7):1527–1558

    Article  MATH  Google Scholar 

  19. Li S, Wu S (2007) Robustness of neural codes and its implication on natural image processing. Cogn Neurodyn 1(3):261–272

    Article  Google Scholar 

  20. Hateren JH, Ruderman DL (1998) Independent component analysis of natural image sequences yields spatiotemporal filters similar to simple cells in primary visual cortex. Proc R Soc Ser B 265:2315–2320

    Article  Google Scholar 

  21. Gong HY, Zhang YY, Liang PJ, Zhang PM (2010) Neural coding properties based on spike timing and pattern correlation of retinal ganglion cells. Cogn Neurodyn 4(4):337–346

    Article  Google Scholar 

  22. Saglam Murat, Hayashida Yuki, Murayama Nobuki (2009) A retinal circuit model accounting for wide-field amacrine cells. Cogn Neurodyn 3(1):25–32

    Article  Google Scholar 

  23. Pillow JW, Shlens J, Paninski L, Sher A, Litke AM (2008) Spatio-temporal correlations and visual signaling in a complete neuronal population. Nature 454:995–999

    Article  Google Scholar 

  24. Li CG, Li YK (2011) Fast and robust image segmentation by small-world neural oscillator networks. Cogn Neurodyn 5(2):209–220

    Article  Google Scholar 

  25. Vialatte FB, Dauwels J, Maurice M, Yamaguchi Y, Cichocki A (2009) On the synchrony of steady state visual evoked potentials and oscillatory burst events. Cogn Neurodyn 3(3):251–261

    Article  Google Scholar 

  26. Huberman AD, Feller MB, Chapman B (2008) Mechanisms underlying development of visual maps and receptive fields. Annu Rev Neurosci 31:479–509

    Article  Google Scholar 

  27. Han JW, Zhao SJ, Hu XT, Guo L, Liu TM (2014) Encoding brain network response to free viewing of videos. Cogn Neurodyn 8(5):389–397

    Article  Google Scholar 

  28. Wang RB, Tsuda I, Zhang ZK (2015) A new work mechanism on neuronal activity. Int J Neural Syst 25(03):1450037

    Article  Google Scholar 

  29. Wang RB (2015) Can the activities of the large scale cortical network be expressed by neural energy? Cogn Neurodyn 1:1–5

    Article  Google Scholar 

  30. Wang ZY, Wang RB, Fang RY (2015) Energy coding in neural network with inhibitory neurons. Cogn Neurodyn 9(2):129–144

    Article  MathSciNet  Google Scholar 

  31. Wang ZY, Wang RB (2014) Energy distribution property and energy coding of a structural neural network. Front Comput Neurosci 8(8):14

    Google Scholar 

  32. Wang RB, Zhang ZK (2011) Phase synchronization motion and neural coding in dynamic transmission of neural information. IEEE Trans Neural Netw 22(7):1097–1106

    Article  Google Scholar 

  33. Wang RB, Zhang ZK (2007) Energy coding in biological neural network. Cogn Neurodyn 1(3):203–212

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rubin Wang.

Ethics declarations

Conflict of interest

The authors do not have any type of conflict of interest.

Appendix

Appendix

1.1 Sparse coding

Sparse coding includes a class of unsupervised methods for learning sets of complete bases for efficient data representation. The aim of sparse coding is to develop a set of basic vectors that represent an input vector as a linear combination of the basic vectors:

$$x = As = \sum\limits_{i = 1}^{n} {a_{i} s_{i} }$$
(1)

where \(x = (x_{1} ,x_{2} , \ldots\,x_{n} )^{T}\) represents input data, \(A = (a_{1} ,a_{2} , \ldots\,a_{m} )^{T}\) is base matrix, \(a_{i}\) is column i in \(A\), which represents the basic functions. \(S = (s_{1} ,s_{2} , \ldots\,s_{m} )^{T}\) denotes coefficient matrix. With a complete basis, \(S\) is no longer uniquely determined by the input vector \(x\). Therefore, we introduced the additional criterion of sparsity in sparse coding. We define sparsity in terms of few nonzero components or few components not close to zero. The choice of sparsity as a desired characteristic in our representation of the input data is motivated by the observation that most sensory data such as natural images may be described as the superposition of a small number of atomic elements such as surfaces or edges. Other justifications such as comparisons of the properties of the primary visual cortex have also been advanced.

We define the sparse coding cost function using a set of n input vectors as follows:

$$F (A ,\,S )= { \hbox{min} }_{{a_{j} ,s_{j} }} \sum\limits_{i = 1}^{n} {\left\| {x_{i} - \sum\limits_{j = 1}^{m} {a_{j} s_{j} } } \right\|^{2} } + \lambda \sum\limits_{j = 1}^{m} {H(s_{j} )}$$
(2)

where \(a_{j}\) represents basic function, \({\text{s}}_{j}\) is coefficient, \(x_{i}\) is input data, \(\lambda\) is a constant, \(H(s_{j} )\) denotes a sparsity cost function, which penalizes \(s_{j}\) for being far from zero. Usually a common choice for the sparsity cost is the L1 penalty \(H(s_{j} ) = \left| {s_{j} } \right|_{1}\), but it is non-differentiable when basic function equals 0; therefore, we selected sparsity cost \(H(s_{j} ) = \sqrt {s_{j}^{2} + \varepsilon }\), wherein \(\varepsilon\) is a constant.

We interpret the first term of the sparse coding objective as a reconstruction term, which uses the algorithm to provide a good representation of x and the second term as a sparsity penalty, which is a sparse representation of x. The constant \(\lambda\) is a scaling constant determining the relative importance of these two contributions.

In addition, it is possible to make the sparsity penalty arbitrarily small by scaling down \(s_{j}\) and scaling \(a_{j}\) up using a large constant. To prevent this event, we constrain \(\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m\) to be less than the constant C.

The full sparse coding cost function including our constraint is as follows:

$$F (A ,\;S )= { \hbox{min} }_{{a_{j} ,s_{j} }} \sum\limits_{i = 1}^{n} {\left\| {x_{i} - \sum\limits_{j = 1}^{m} {a_{j} s_{j} } } \right\|^{2} } + \lambda \sum\limits_{j = 1}^{m} {H(s_{j} )}$$
(3)

Subject to \(\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m\).

However, the constraint of \(\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m\) cannot be enforced using simple gradient-based methods. This constraint is weakened to a “weight decay” term designed to keep the entries of \(A\) small. Therefore, we added the constraints to the objective function to provide a new objective function:

$${\text{F}}(A,S) = \left\| {{\text{X}} - AS} \right\|^{2}\,+\,\lambda \sum {\sqrt {{\text{S}}^{2} + \varepsilon } } + \gamma \left\| A \right\|^{2}$$
(4)

where \(\lambda\) and \(\gamma\) are constants, \(A = (a_{1} ,a_{2} , \ldots\,a_{m} )^{T}\) is base matrix, \(S = (s_{1} ,s_{2} , \ldots\,s_{\text{m}} )^{T}\) is coefficient matrix.

The objective function is non-convex and hence impossible to optimize well using gradient-based methods. However, given \(A\), the problem of finding \(S\) that minimizes \({\text{F}}(A,S)\) is convex. Similarly, given \({\text{S}}\), the problem of finding \(A\) that minimizes \({\text{F}}(A,S)\) is also convex suggesting an alternative to optimize \(A\) for a fixed \(S\) and then optimizing \(S\) with a fixed \(A\).

The analytic solution of \(A\) is obtained as follows:

$$\frac{{\partial {\text{F}}(A,S)}}{\partial A} = xS^{T} - ASS^{T} - \gamma A$$
(5)

The analytic solution of \(S\) is provided by:

$$\frac{{\partial {\text{F}}(A,S)}}{\partial S} = A^{T} x - A^{T} AS - \lambda S./\sqrt {S^{2} + \varepsilon } .$$
(6)

Therefore, the learning equation of basic function \(a_{i}\) is represented by:

$$\Delta a_{i} = a_{i} (t + 1) - a_{i} (t) = x_{i} s_{i}^{T} - a_{i} s_{i} s_{i}^{T} - \gamma a_{i} .$$
(7)

The learning equation of coefficient \(s_{i}\) is as follows:

$$\Delta s_{i} = s_{i} (t + 1) - s_{i} (t) = a_{i}^{T} x_{i} - a_{i}^{T} a_{i} s_{i} - \lambda s_{i} ./\sqrt {s_{i}^{T} s_{i} + \varepsilon } .$$
(8)

Using the simple iterative algorithm on a large dataset (including 10,000 patches) results in prolonged iterations and convergence of the algorithm. To increase the rate of convergence by accelerating the iteration, the algorithm may be run on mini-patches selecting a mini-patch random subset of 1000 patches from the 10,000 patches.

A faster and better convergence may be obtained via initialization of the feature matrix \(S\) before using gradient descent (or other methods) to optimize the objective function for \(S\) given \(A\). In practice, initializing \(S\) randomly at each iteration results in poor convergence unless a good optimum is found for \(S\) before optimizing for \(A\). A better way to initialize \(S\) involves the following steps:

  1. 1.

    Random initialization of \(A.\)

  2. 2.

    Repetition until convergence.

    1. 1.

      Selection of a mini-patch random subset.

    2. 2.

      Initialization of \(S\) with \(S\) = \({\text{A}}^{T} X\), dividing the feature by the corresponding basic vector in \(A.\)

    3. 3.

      Finding \(S\) that minimizes \({\text{F}}(A,S)\) for the \(A\) in the previous step.

    4. 4.

      Determination of \(A\) that minimizes \({\text{F}}(A,S)\) for the \(S\) found in the previous step.

Using this method, good local optima can be reached relatively quickly.

1.2 ICA and fast ICA

In neurobiology, sparse coding can be interpreted as encoding an input stimulus as completely as possible in the activity of a few neurons. In the mathematical sense, a set of neurons is the most efficient if the response of each neuron was statistically independent. ICA attempts to analyze a multivariate signal into independent non-Gaussian signals. ICA can be used in natural images to obtain a set of independent linear basic functions. Therefore, ICA reveals the essential characteristics of data adequately [9]. We can use ICA to feature extraction and image processing, followed by sparse coding of images. ICA model essentially represents the properties of simple cell receptive fields in primary visual cortex. The basic functions of ICA are similar to Gabor function. The principal characteristics of the spatial cell receptive fields in primates appear to be selectively tuned for location, orientation and frequency, which are the properties of ICA [9, 13, 16].

Therefore, training images through ICA yields basic matrix, although the convergence speed of the objective function is very slow when the independent components are extracted using gradient descent. Further, the step-size choice is very difficult.

Fast ICA invented by Hyvärinen at Helsinki University of Technology is a fast iterative algorithm with enormous convergence speed. It seeks an orthogonal rotation of pre-whitened data, through a fixed-point iteration scheme that maximizes the non-Gaussian measure of the rotated components [14].

The steps of fast ICA is as follows:

  1. 1.

    Whitening and centralizing the input data \(X\), and get \(Z.\)

  2. 2.

    Initialize \(W_{p}\) randomly.

  3. 3.

    \(W_{p} = E [Zg (W_{p}^{T} Z ) ] { - }E [g^{\prime } (W_{p}^{T} Z ) ]W\), usually g(.) = tanh(.), and E[.] represents averaging operation.

  4. 4.
    $$W_{p} = W_{p} - \sum\limits_{j = 1}^{p - 1} { (W_{p}^{T} W_{j} )} W_{j} .$$
  5. 5.

    Normalization process: \(W_{p} = W_{p} /\left\| {W_{p} } \right\|.\)

  6. 6.

    Repeat 3 until \(W_{p}\) convergence.

Then, we can get basis matrix \(A\) by the following equation:

$${\text{A}} = (W(W^{T} W)^{ - 1/2} )^{ - 1} .$$
(9)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, G., Wang, R. Sparse coding network model based on fast independent component analysis. Neural Comput & Applic 31, 887–893 (2019). https://doi.org/10.1007/s00521-017-3116-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3116-3

Keywords

Navigation