Sparse coding network model based on fast independent component analysis

Wang, Guanzheng; Wang, Rubin

doi:10.1007/s00521-017-3116-3

Sparse coding network model based on fast independent component analysis

Original Article
Published: 28 June 2017

Volume 31, pages 887–893, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Guanzheng Wang¹ &
Rubin Wang¹

308 Accesses
3 Citations
Explore all metrics

Abstract

Neurobiological studies have shown that neurons in the primary visual cortex (V1) may employ sparse presentations to represent stimuli. We describe a network model for sparse coding which includes input layer, base functional layer and output layer. We simulated standard sparse coding and sparse coding based on fast independent component analysis (ICA). The duration of training bases, the convergence speed of objective function and the sparsity of coefficient matrix were compared, respectively. The results show that sparse coding based on fast ICA is more effective than standard sparse coding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Double-Dictionary Approach Learns Component Means and Variances for V1 Encoding

Sparse Representation via Intracellular and Extracellular Mechanisms

Nonnegative Source Separation with Expansive Nonlinearity: Comparison with the Primary Visual Cortex

References

Treichler DG (1967) Are you missing the boat in training aids. Film Audio Vis Commun 1:14–16
Google Scholar
Olshausen BA, Field DJ (1997) Sparse coding with an over complete basis set: a strategy employed by V1. Vis Res 37:3313–3325
Article Google Scholar
Field DJ (1994) What is the goal of sensory coding? Neural Comput 6:559–601
Article Google Scholar
Olshausen BA, Field DJ (1996) Emergence of simple cell receptive properties by learning a sparse code for natural images. Nature 381:607–609
Article Google Scholar
Simoncelli EP (2003) Vision and the statistics of the visual environment. Curr Opin Neurobiol 1(13):144–149
Article Google Scholar
Delgutte B, Hammond B, Cariani P (1998) Psychophysical and physiological advances in hearing. Whurr Publishers, London
Google Scholar
Ruderman DL, Bialek W (1994) Statistics of natural images: scaling in the woods. Phys Rev Lett 73(6):814–817
Article Google Scholar
Kandel ER, Schwartz JH, Jessel TM (2000) Principles of neural science, vol 4. McGraw-Hill Medical, New york
Google Scholar
Hyvarinen A (1999) Survey on independent component analysis. Neural Comput Surv 2(4):94–128
Google Scholar
Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. Curr Opin Neurobiol 14:481–487
Article Google Scholar
Lewick M (2002) Efficient coding of natural sounds. Nat Neurosci 5:356–363
Article Google Scholar
Vinje W, Gallant J (2002) Natural stimulation of the non-classical receptive field increases information transmission efficiency in V1. J Neurosci 22:2904–2915
Article Google Scholar
Hubel DH, Wiesel TN (1977) Functional architecture of macaque monkey visual cortex. Proc R Soc Lond B 198:1–59
Article Google Scholar
Hyvarinen A, Hoyer PO (2002) A two-layer sparse coding model learn simple and complex cell receptive fields and topography from natural images. Vis Res 41(18):2413–2423
Article Google Scholar
Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representation. Annu Rev Neurosci 24:1193–1216
Article Google Scholar
Hoyer PO, Hyvarinen A (2000) Independent component analysis applied to feature extraction from colour and stereo images. Netw Comput Neural Syst 11(3):191–210
Article MATH Google Scholar
Haken H (2007) Towards a unifying model of neural net activity in the visual cortex. Cogn Neurodyn 1(1):15–25
Article Google Scholar
Hyvarinen A, Hoyer PO, Mika OI (2001) Topographic independent component analysis. Neural Comput 13(7):1527–1558
Article MATH Google Scholar
Li S, Wu S (2007) Robustness of neural codes and its implication on natural image processing. Cogn Neurodyn 1(3):261–272
Article Google Scholar
Hateren JH, Ruderman DL (1998) Independent component analysis of natural image sequences yields spatiotemporal filters similar to simple cells in primary visual cortex. Proc R Soc Ser B 265:2315–2320
Article Google Scholar
Gong HY, Zhang YY, Liang PJ, Zhang PM (2010) Neural coding properties based on spike timing and pattern correlation of retinal ganglion cells. Cogn Neurodyn 4(4):337–346
Article Google Scholar
Saglam Murat, Hayashida Yuki, Murayama Nobuki (2009) A retinal circuit model accounting for wide-field amacrine cells. Cogn Neurodyn 3(1):25–32
Article Google Scholar
Pillow JW, Shlens J, Paninski L, Sher A, Litke AM (2008) Spatio-temporal correlations and visual signaling in a complete neuronal population. Nature 454:995–999
Article Google Scholar
Li CG, Li YK (2011) Fast and robust image segmentation by small-world neural oscillator networks. Cogn Neurodyn 5(2):209–220
Article Google Scholar
Vialatte FB, Dauwels J, Maurice M, Yamaguchi Y, Cichocki A (2009) On the synchrony of steady state visual evoked potentials and oscillatory burst events. Cogn Neurodyn 3(3):251–261
Article Google Scholar
Huberman AD, Feller MB, Chapman B (2008) Mechanisms underlying development of visual maps and receptive fields. Annu Rev Neurosci 31:479–509
Article Google Scholar
Han JW, Zhao SJ, Hu XT, Guo L, Liu TM (2014) Encoding brain network response to free viewing of videos. Cogn Neurodyn 8(5):389–397
Article Google Scholar
Wang RB, Tsuda I, Zhang ZK (2015) A new work mechanism on neuronal activity. Int J Neural Syst 25(03):1450037
Article Google Scholar
Wang RB (2015) Can the activities of the large scale cortical network be expressed by neural energy? Cogn Neurodyn 1:1–5
Article Google Scholar
Wang ZY, Wang RB, Fang RY (2015) Energy coding in neural network with inhibitory neurons. Cogn Neurodyn 9(2):129–144
Article MathSciNet Google Scholar
Wang ZY, Wang RB (2014) Energy distribution property and energy coding of a structural neural network. Front Comput Neurosci 8(8):14
Google Scholar
Wang RB, Zhang ZK (2011) Phase synchronization motion and neural coding in dynamic transmission of neural information. IEEE Trans Neural Netw 22(7):1097–1106
Article Google Scholar
Wang RB, Zhang ZK (2007) Energy coding in biological neural network. Cogn Neurodyn 1(3):203–212
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Cognitive Neurodynamics, East China University of Science and Technology, Meilong Road 130, Shanghai, 200237, China
Guanzheng Wang & Rubin Wang

Authors

Guanzheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rubin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rubin Wang.

Ethics declarations

Conflict of interest

The authors do not have any type of conflict of interest.

Appendix

1.1 Sparse coding

Sparse coding includes a class of unsupervised methods for learning sets of complete bases for efficient data representation. The aim of sparse coding is to develop a set of basic vectors that represent an input vector as a linear combination of the basic vectors:

$$x = As = \sum\limits_{i = 1}^{n} {a_{i} s_{i} }$$

(1)

where $x = (x_{1} ,x_{2} , \ldots\,x_{n} )^{T}$ represents input data, $A = (a_{1} ,a_{2} , \ldots\,a_{m} )^{T}$ is base matrix, $a_{i}$ is column i in $A$, which represents the basic functions. $S = (s_{1} ,s_{2} , \ldots\,s_{m} )^{T}$ denotes coefficient matrix. With a complete basis, $S$ is no longer uniquely determined by the input vector $x$. Therefore, we introduced the additional criterion of sparsity in sparse coding. We define sparsity in terms of few nonzero components or few components not close to zero. The choice of sparsity as a desired characteristic in our representation of the input data is motivated by the observation that most sensory data such as natural images may be described as the superposition of a small number of atomic elements such as surfaces or edges. Other justifications such as comparisons of the properties of the primary visual cortex have also been advanced.

We define the sparse coding cost function using a set of n input vectors as follows:

$$F (A ,\,S )= { \hbox{min} }_{{a_{j} ,s_{j} }} \sum\limits_{i = 1}^{n} {\left\| {x_{i} - \sum\limits_{j = 1}^{m} {a_{j} s_{j} } } \right\|^{2} } + \lambda \sum\limits_{j = 1}^{m} {H(s_{j} )}$$

(2)

where $a_{j}$ represents basic function, ${\text{s}}_{j}$ is coefficient, $x_{i}$ is input data, $\lambda$ is a constant, $H(s_{j} )$ denotes a sparsity cost function, which penalizes $s_{j}$ for being far from zero. Usually a common choice for the sparsity cost is the L1 penalty $H(s_{j} ) = \left| {s_{j} } \right|_{1}$, but it is non-differentiable when basic function equals 0; therefore, we selected sparsity cost $H(s_{j} ) = \sqrt {s_{j}^{2} + \varepsilon }$, wherein $\varepsilon$ is a constant.

We interpret the first term of the sparse coding objective as a reconstruction term, which uses the algorithm to provide a good representation of x and the second term as a sparsity penalty, which is a sparse representation of x. The constant $\lambda$ is a scaling constant determining the relative importance of these two contributions.

In addition, it is possible to make the sparsity penalty arbitrarily small by scaling down $s_{j}$ and scaling $a_{j}$ up using a large constant. To prevent this event, we constrain $\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m$ to be less than the constant C.

The full sparse coding cost function including our constraint is as follows:

$$F (A ,\;S )= { \hbox{min} }_{{a_{j} ,s_{j} }} \sum\limits_{i = 1}^{n} {\left\| {x_{i} - \sum\limits_{j = 1}^{m} {a_{j} s_{j} } } \right\|^{2} } + \lambda \sum\limits_{j = 1}^{m} {H(s_{j} )}$$

(3)

Subject to $\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m$.

However, the constraint of $\left\| {a_{j} } \right\|^{2} \le C ,\quad \forall j = 1 , 2 ,\ldots\,m$ cannot be enforced using simple gradient-based methods. This constraint is weakened to a “weight decay” term designed to keep the entries of $A$ small. Therefore, we added the constraints to the objective function to provide a new objective function:

$${\text{F}}(A,S) = \left\| {{\text{X}} - AS} \right\|^{2}\,+\,\lambda \sum {\sqrt {{\text{S}}^{2} + \varepsilon } } + \gamma \left\| A \right\|^{2}$$

(4)

where $\lambda$ and $\gamma$ are constants, $A = (a_{1} ,a_{2} , \ldots\,a_{m} )^{T}$ is base matrix, $S = (s_{1} ,s_{2} , \ldots\,s_{\text{m}} )^{T}$ is coefficient matrix.

The objective function is non-convex and hence impossible to optimize well using gradient-based methods. However, given $A$, the problem of finding $S$ that minimizes ${\text{F}}(A,S)$ is convex. Similarly, given ${\text{S}}$, the problem of finding $A$ that minimizes ${\text{F}}(A,S)$ is also convex suggesting an alternative to optimize $A$ for a fixed $S$ and then optimizing $S$ with a fixed $A$.

The analytic solution of $A$ is obtained as follows:

$$\frac{{\partial {\text{F}}(A,S)}}{\partial A} = xS^{T} - ASS^{T} - \gamma A$$

(5)

The analytic solution of $S$ is provided by:

$$\frac{{\partial {\text{F}}(A,S)}}{\partial S} = A^{T} x - A^{T} AS - \lambda S./\sqrt {S^{2} + \varepsilon } .$$

(6)

Therefore, the learning equation of basic function $a_{i}$ is represented by:

$$\Delta a_{i} = a_{i} (t + 1) - a_{i} (t) = x_{i} s_{i}^{T} - a_{i} s_{i} s_{i}^{T} - \gamma a_{i} .$$

(7)

The learning equation of coefficient $s_{i}$ is as follows:

$$\Delta s_{i} = s_{i} (t + 1) - s_{i} (t) = a_{i}^{T} x_{i} - a_{i}^{T} a_{i} s_{i} - \lambda s_{i} ./\sqrt {s_{i}^{T} s_{i} + \varepsilon } .$$

(8)

Using the simple iterative algorithm on a large dataset (including 10,000 patches) results in prolonged iterations and convergence of the algorithm. To increase the rate of convergence by accelerating the iteration, the algorithm may be run on mini-patches selecting a mini-patch random subset of 1000 patches from the 10,000 patches.

A faster and better convergence may be obtained via initialization of the feature matrix $S$ before using gradient descent (or other methods) to optimize the objective function for $S$ given $A$. In practice, initializing $S$ randomly at each iteration results in poor convergence unless a good optimum is found for $S$ before optimizing for $A$. A better way to initialize $S$ involves the following steps:

1.
Random initialization of $A.$
2.
Repetition until convergence.
1. 1.
  Selection of a mini-patch random subset.
2. 2.
  Initialization of $S$ with $S$ = ${\text{A}}^{T} X$, dividing the feature by the corresponding basic vector in $A.$
3. 3.
  Finding $S$ that minimizes ${\text{F}}(A,S)$ for the $A$ in the previous step.
4. 4.
  Determination of $A$ that minimizes ${\text{F}}(A,S)$ for the $S$ found in the previous step.

Using this method, good local optima can be reached relatively quickly.

1.2 ICA and fast ICA

In neurobiology, sparse coding can be interpreted as encoding an input stimulus as completely as possible in the activity of a few neurons. In the mathematical sense, a set of neurons is the most efficient if the response of each neuron was statistically independent. ICA attempts to analyze a multivariate signal into independent non-Gaussian signals. ICA can be used in natural images to obtain a set of independent linear basic functions. Therefore, ICA reveals the essential characteristics of data adequately [9]. We can use ICA to feature extraction and image processing, followed by sparse coding of images. ICA model essentially represents the properties of simple cell receptive fields in primary visual cortex. The basic functions of ICA are similar to Gabor function. The principal characteristics of the spatial cell receptive fields in primates appear to be selectively tuned for location, orientation and frequency, which are the properties of ICA [9, 13, 16].

Therefore, training images through ICA yields basic matrix, although the convergence speed of the objective function is very slow when the independent components are extracted using gradient descent. Further, the step-size choice is very difficult.

Fast ICA invented by Hyvärinen at Helsinki University of Technology is a fast iterative algorithm with enormous convergence speed. It seeks an orthogonal rotation of pre-whitened data, through a fixed-point iteration scheme that maximizes the non-Gaussian measure of the rotated components [14].

The steps of fast ICA is as follows:

1.
Whitening and centralizing the input data $X$, and get $Z.$
2.
Initialize $W_{p}$ randomly.
3.
$W_{p} = E [Zg (W_{p}^{T} Z ) ] { - }E [g^{\prime } (W_{p}^{T} Z ) ]W$, usually g(.) = tanh(.), and E[.] represents averaging operation.
4.
$$W_{p} = W_{p} - \sum\limits_{j = 1}^{p - 1} { (W_{p}^{T} W_{j} )} W_{j} .$$
5.
Normalization process: $W_{p} = W_{p} /\left\| {W_{p} } \right\|.$
6.
Repeat 3 until $W_{p}$ convergence.

Then, we can get basis matrix $A$ by the following equation:

$${\text{A}} = (W(W^{T} W)^{ - 1/2} )^{ - 1} .$$

(9)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, G., Wang, R. Sparse coding network model based on fast independent component analysis. Neural Comput & Applic 31, 887–893 (2019). https://doi.org/10.1007/s00521-017-3116-3

Download citation

Received: 04 January 2016
Accepted: 13 June 2017
Published: 28 June 2017
Issue Date: 14 March 2019
DOI: https://doi.org/10.1007/s00521-017-3116-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Sparse coding network model based on fast independent component analysis

Abstract

Access this article

Similar content being viewed by others

A Double-Dictionary Approach Learns Component Means and Variances for V1 Encoding

Sparse Representation via Intracellular and Extracellular Mechanisms

Nonnegative Source Separation with Expansive Nonlinearity: Comparison with the Primary Visual Cortex

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix

1.1 Sparse coding

1.2 ICA and fast ICA

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse coding network model based on fast independent component analysis

Abstract

Access this article

Similar content being viewed by others

A Double-Dictionary Approach Learns Component Means and Variances for V1 Encoding

Sparse Representation via Intracellular and Extracellular Mechanisms

Nonnegative Source Separation with Expansive Nonlinearity: Comparison with the Primary Visual Cortex

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendix

Appendix

1.1 Sparse coding

1.2 ICA and fast ICA

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation