## Abstract

Synchronized spontaneous firing among retinal ganglion cells (RGCs), on timescales faster than visual responses, has been reported in many studies. Two candidate mechanisms of synchronized firing include direct coupling and shared noisy inputs. In neighboring parasol cells of primate retina, which exhibit rapid synchronized firing that has been studied extensively, recent experimental work indicates that direct electrical or synaptic coupling is weak, but shared synaptic input in the absence of modulated stimuli is strong. However, previous modeling efforts have not accounted for this aspect of firing in the parasol cell population. Here we develop a new model that incorporates the effects of common noise, and apply it to analyze the light responses and synchronized firing of a large, densely-sampled network of over 250 simultaneously recorded parasol cells. We use a generalized linear model in which the spike rate in each cell is determined by the linear combination of the spatio-temporally filtered visual input, the temporally filtered prior spikes of that cell, and unobserved sources representing common noise. The model accurately captures the statistical structure of the spike trains and the encoding of the visual stimulus, without the direct coupling assumption present in previous modeling work. Finally, we examined the problem of decoding the visual stimulus from the spike train given the estimated parameters. The common-noise model produces Bayesian decoding performance as accurate as that of a model with direct coupling, but with significantly more robustness to spike timing perturbations.

### Similar content being viewed by others

## Notes

It is also useful to recall the relationship between the spatial covariance

**C**_{ s }and the mixing matrix**M**here: as noted above, since the common noise terms*q*_{ t }have unit variance and are independent from the stimulus and from one another, the spatial covariance**C**_{ s }is given by**M**^{T}**M**. However, since \(\mathbf{C}_s = (\mathbf{UM})^T(\mathbf{UM})\) for any unitary matrix**U**, we can not estimate**M**directly; we can only obtain an estimate of**C**_{ s }. Therefore, we can proceed with the estimation of the spatial covariance**C**_{ s }and use any convenient decomposition of this matrix for our calculations below. (We emphasize the non-uniqueness of**M**because it is tempting to interpret**M**as a kind of effective connectivity matrix, and this over-interpretation should be avoided.)Estimating the spatio-temporal filters

**k**_{ i }simultaneously while marginalizing over the common noise**q**is possible but computationally challenging. Therefore we held the shape of**k**_{ i }fixed in this second stage but optimized its Euclidean length ||**k**_{ i }||_{2}. The resulting model explained the data well, as discussed in Section 3 below.We make one further simplifying approximation in the case of the population model with no direct coupling terms. Here the dimensionality of the common noise terms is extremely large:

*dim*(**q**) =*n*_{cells}×*T*, where*T*is the number of timebins in the experimental data. As discussed in Appendix A, direct optimization over**q**can be performed in \(O(n_{\rm cells}^3 T)\) time, i.e., computational time scaling linearly with*T*but unfortunately cubically in*n*_{cells}(see Paninski et al. 2010 for further discussion). This cubic scaling becomes prohibitive for large populations. However, if we make the simplifying approximation that the common noise injected into each RGC is nearly conditionally independent given the observed spikes**y**when computing the marginal likelihood*p*(**y**|**Θ**,**C**_{ s }), then the optimizations over the*n*_{cells}independent noise terms**q**_{ i }can be performed independently, with the total computation time scaling linearly in both*n*_{cells}and*T*. We found this approximation to perform reasonably in practice (see Section 3 below), largely because the pairwise correlation in the common-noise terms was always significantly less than one (see Fig. 4 below). No such approximation was necessary in the pairwise case, where the computation always scales as*O*(*T*).In many of the three point correlation functions one can notice persistent diagonal structures. If we consider the three point correlation of neuron 1 with the time shifted firing of neurons 2 and 3, the diagonal structure is the sign of a correlation between the time shifted neurons (2 and 3).

Since

**x**is binary, strictly speaking,**u**^{i}is not a Gaussian vector solely described by its covariance. However, because the filters**k**^{i}have a relatively large spatiotemporal dimension, the components of**u**^{i}are weighted sums of many independent identically distributed binary random variables, and their prior marginal distributions can be well approximated by Gaussian distributions (see Pillow et al. 2011 and Ahmadian et al. 2011 for further discussion of this point). For this reason, we replaced the true (non-Gaussian) joint prior distribution of**y**_{ i }with a Gaussian distribution with zero mean and covariance Eq. (24).

## References

Agresti, A. (2002). Categorical data analysis. In

*Wiley series in probability and mathematical statistics*.Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E., Simoncelli, E., & Paninski, L. (2009). A decoder-based spike train metric for analyzing the neural code in the retina. In

*COSYNE09*.Ahmadian, Y., Pillow, J. W., & Paninski, L. (2011). Efficient Markov chain Monte Carlo methods for decoding neural spike trains.

*Neural Computation, 23*, 46–96.Arnett, D. (1978). Statistical dependence between neighboring retinal ganglion cells in goldfish.

*Experimental Brain Research, 32*(1), 49–53.Bickel, P., & Doksum, K. (2001).

*Mathematical statistics: Basic ideas and selected topics*. Prentice Hall.Brivanlou, I., Warland, D., & Meister, M. (1998). Mechanisms of concerted firing among retinal ganglion cells.

*Neuron, 20*(3), 527–539.Brody, C. (1999). Correlations without synchrony.

*Neural Computation, 11*, 1537–1551.Cafaro, J., & Rieke, F. (2010). Noise correlations improve response fidelity and stimulus encoding.

*Nature, 468*(7326), 964–967.Chornoboy, E., Schramm, L., & Karr, A. (1988). Maximum likelihood identification of neural point process systems.

*Biological Cybernetics, 59*, 265–275.Cocco, S., Leibler, S., & Monasson, R. (2009). Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods.

*Proceedings of the National Academy of Sciences, 106*(33), 14058.Cossart, R., Aronov, D., & Yuste, R. (2003). Attractor dynamics of network up states in the neocortex.

*Nature, 423*, 283–288.Dacey, D., & Brace, S. (1992). A coupled network for parasol but not midget ganglion cells in the primate retina.

*Visual Neuroscience, 9*(3–4), 279–290.DeVries, S. H. (1999). Correlated firing in rabbit retinal ganglion cells.

*Journal of Neurophysiology, 81*(2), 908–920.Dombeck, D., Khabbaz, A., Collman, F., Adelman, T., & Tank, D. (2007). Imaging large-scale neural activity with cellular resolution in awake, mobile mice.

*Neuron, 56*(1), 43–57.Dorn, J. D., & Ringach, D. L. (2003). Estimating membrane voltage correlations from extracellular spike trains.

*Journal of Neurophysiology, 89*(4), 2271–2278.Fahrmeir, L., & Kaufmann, H. (1991). On Kalman filtering, posterior mode estimation and fisher scoring in dynamic exponential family regression.

*Metrika, 38*, 37–60.Fahrmeir, L., & Tutz, G. (1994).

*Multivariate statistical modelling based on generalized linear models*. Springer.Field, G., Gauthier, J., Sher, A., Greschner, M., Machado, T., Jepson, L., et al. (2010). Mapping a neural circuit: A complete input-output diagram in the primate retina.

*Nature, 467*, 673–677.Frechette, E., Sher, A., Grivich, M., Petrusca, D., Litke, A., & Chichilnisky, E. (2005). Fidelity of the ensemble code for visual motion in primate retina.

*Journal of Neurophysiology, 94*(1), 119.Gauthier, J., Field, G., Sher, A., Greschner, M., Shlens, J., Litke, A., et al. (2009). Receptive fields in primate retina are coordinated to sample visual space more uniformly.

*PLoS Biology, 7*(4), e1000,063.Greschner, M., Shlens, J., Bakolitsa, C., Field, G., Gauthier, J., Jepson, L., et al. (2011). Correlated firing among major ganglion cell types in primate retina.

*The Journal of Physiology, 589*(1), 75.Gutnisky, D., & Josic, K. (2010). Generation of spatiotemporally correlated spike trains and local field potentials using a multivariate autoregressive process.

*Journal of Neurophysiology, 103*(5), 2912.Harris, K., Csicsvari, J., Hirase, H., Dragoi, G., & Buzsaki, G. (2003). Organization of cell assemblies in the hippocampus.

*Nature, 424*, 552–556.Hayes, M. H. (1996).

*Statistical digital signal processing and modeling*. Wiley.Haykin, S. (2001).

*Adaptive filter theory*. Pearson Education India.Higham, N. (1988). Computing a nearest symmetric positive semidefinite matrix.

*Linear Algebra and its Applications, 103*, 103–118.Iyengar, S. (2001). The analysis of multiple neural spike trains. In

*Advances in methodological and applied aspects of probability and statistics, Gordon and Breach*(pp. 507–524).Kass, R., & Raftery, A. (1995). Bayes factors.

*Journal of the American Statistical Association, 90*, 773–795.Keat, J., Reinagel, P., Reid, R., & Meister, M. (2001). Predicting every spike: A model for the responses of visual neurons.

*Neuron, 30*, 803–817.Kelly, R., Smith, M., Samonds, J., Kohn, A., Bonds, J., Movshon, A. B., et al. (2007). Comparison of recordings from microelectrode arrays and single electrodes in the visual cortex.

*Journal of Neuroscience, 27*, 261–264.Kerr, J. N. D., Greenberg, D., & Helmchen, F. (2005). Imaging input and output of neocortical networks

*in vivo*.*PNAS, 102*(39), 14063–14068.Koyama, S., & Paninski, L. (2010). Efficient computation of the map path and parameter estimation in integrate-and-fire and more general state-space models.

*Journal of Computational Neuroscience, 29*, 89–105.Krumin, M., & Shoham, S. (2009). Generation of spike trains with controlled auto-and cross-correlation functions.

*Neural Computation, 21*(6), 1642–1664.Kulkarni, J., & Paninski, L. (2007). Common-input models for multiple neural spike-train data.

*Network: Computation in Neural Systems, 18*, 375–407.Latham, P., & Nirenberg, S. (2005). Synergy, redundancy, and independence in population codes, revisited.

*The Journal of Neuroscience, 25*(21), 5195.Lawhern, V., Wu, W., Hatsopoulos, N., & Paninski, L. (2010). Population decoding of motor cortical activity using a generalized linear model with hidden states.

*Journal of Neuroscience Methods, 189*(2), 267–280.Litke, A., Bezayiff N., Chichilnisky E., Cunningham W., Dabrowski W., Grillo A., et al. (2004). What does the eye tell the brain? Development of a system for the large scale recording of retinal output activity.

*IEEE Transactions on Nuclear Science, 51*, 1434–1440.Macke, J., Berens, P., Ecker, A., Tolias, A., & Bethge, M. (2009). Generating spike trains with specified correlation coefficients.

*Neural Computation, 21*, 397–423.MacLean, J., Watson, B., Aaron, G., & Yuste, R. (2005). Internal dynamics determine the cortical response to thalamic stimulation.

*Neuron, 48*, 811–823.Martignon, L., Deco, G., Laskey, K., Diamond, M., Freiwald, W., & Vaadia, E. (2000). Neural coding: Higher-order temporal patterns in the neuro-statistics of cell assemblies.

*Neural Computation, 12*, 2621–2653.Masmoudi, K., Antonini, M., & Kornprobst, P. (2010). Encoding and decoding stimuli using a biological realistic model: The non-determinism in spike timings seen as a dither signal. In

*Proc of research in encoding and decoding of neural ensembles*.Mastronarde, D. (1983). Correlated firing of cat retinal ganglion cells. I. Spontaneously active inputs to x-and y-cells.

*Journal of Neurophysiology, 49*(2), 303.McCulloch, C., Searle, S., & Neuhaus, J. (2008). Generalized, linear, and mixed models. In

*Wiley series in probability and statistics*.Meister, M., Lagnado, L., & Baylor, D. (1995). Concerted signaling by retinal ganglion cells.

*Science, 270*(5239), 1207.Mishchenko, Y., Vogelstein, J., & Paninski, L. (2011). A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data.

*The Annals of Applied Statistics, 5*(2B), 1229–1261.Nicolelis, M., Dimitrov, D., Carmena, J., Crist, R., Lehew, G., Kralik, J., et al. (2003). Chronic, multisite, multielectrode recordings in macaque monkeys.

*PNAS, 100*, 11,041–11,046.Niebur, E. (2007). Generation of synthetic spike trains with defined pairwise correlations.

*Neural Computation, 19*(7), 1720–1738.Nirenberg, S., Carcieri, S., Jacobs, A., & Latham, P. (2002). Retinal ganglion cells act largely as independent encoders.

*Nature, 411*, 698–701.Nykamp, D. (2005). Revealing pairwise coupling in linear-nonlinear networks.

*SIAM Journal on Applied Mathematics, 65*, 2005–2032.Nykamp, D. (2008). Exploiting history-dependent effects to infer network connectivity.

*SIAM Journal on Applied Mathematics, 68*(2), 354–391.Nykamp, D. (2009). A stimulus-dependent connectivity analysis of neuronal networks.

*Journal of Mathematical Biology, 59*(2), 147–173.Ohki, K., Chung, S., Kara, P., Hübener, M., Bonhoeffer, T., & Reid, R. (2006). Highly ordered arrangement of single neurons in orientation pinwheels.

*Nature, 442*(7105), 925–928.Okatan, M., Wilson, M., & Brown, E. (2005). Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity.

*Neural Computation, 17*, 1927–1961.Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models.

*Network: Computation in Neural Systems, 15*, 243–262.Paninski, L. (2005). Log-concavity results on Gaussian process methods for supervised and unsupervised learning. Advances in Neural Information Processing Systems 17.

Paninski, L., Pillow, J., & Simoncelli, E. (2004). Maximum likelihood estimation of a stochastic integrate-and-fire neural model.

*Neural Computation, 16*, 2533–2561.Paninski, L., Ahmadian, Y., Ferreira, D., Koyama, S., Rahnama Rad, K., Vidne, M., et al. (2010) A new look at state-space models for neural data.

*Journal of Computational Neuroscience, 29*, 107–126.Pillow, J., & Latham, P. (2007). Neural characterization in partially observed populations of spiking neurons. In

*NIPS*.Pillow, J., Paninski, L., Uzzell, V., Simoncelli, E., & Chichilnisky, E. (2005). Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model.

*Journal of Neuroscience, 25*, 11,003–11,013.Pillow, J., Shlens, J., Paninski, L., Sher, A., Litke, A., Chichilnisky, E., et al. (2008). Spatiotemporal correlations and visual signaling in a complete neuronal population.

*Nature, 454*, 995–999.Pillow, J., Ahmadian, Y., & Paninski, L. (2011). Model-based decoding, information estimation, and change-point detection in multi-neuron spike trains.

*Neural Computation, 23*, 1–45.Rigat, F., de Gunst, M., & van Pelt, J. (2006). Bayesian modelling and analysis of spatio-temporal neuronal networks.

*Bayesian Analysis, 1*, 733–764.de la Rocha, J., Doiron, B., Shea-Brown, E., Josic, K., & Reyes, A. (2007). Correlation between neural spike trains increases with firing rate.

*Nature, 448*, 802–806.Rybicki, G., & Hummer, D. (1991). An accelerated lambda iteration method for multilevel radiative transfer, appendix b: Fast solution for the diagonal elements of the inverse of a tridiagonal matrix.

*Astronomy and Astrophysics, 245*, 171.Schneidman, E., Bialek, W., & Berry, M. (2003). Synergy, redundancy, and independence in population codes.

*The Journal of Neuroscience, 23*(37), 11,539.Schneidman, E., Berry, M., Segev, R., & Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population.

*Nature, 440*, 1007–1012.Schnitzer, M., & Meister, M. (2003). Multineuronal firing patterns in the signal from eye to brain.

*Neuron, 37*, 499–511.Segev, R., Goodhouse, J., Puchalla, J., & Berry, M. (2004). Recording spikes from a large fraction of the ganglion cells in a retinal patch.

*Nature Neuroscience, 7*, 1154–1161.Shlens, J., Field, G., Gauthier, J., Grivich, M., Petrusca, D., Sher, A., et al. (2006). The structure of multi-neuron firing patterns in primate retina.

*The Journal of Neuroscience, 26*(32), 8254.Shlens, J., Field, G. D., Gauthier, J. L., Greschner, M., Sher, A., Litke, A. M., et al. (2009). The structure of large-scale synchronized firing in primate retina.

*Journal of Neuroscience, 29*(15), 5022–5031. doi:10.1523/JNEUROSCI.5187-08.2009Smith, A., & Brown, E. (2003). Estimating a state-space model from point process observations.

*Neural Computation, 15*, 965–991.Stein, R., Weber, D., Aoyagi, Y., Prochazka, A., Wagenaar, J., Shoham, S., et al. (2004). Coding of position by simultaneously recorded sensory neurones in the cat dorsal root ganglion.

*The Journal of Physiology, 560*(3), 883–896.Stevenson, I., Rebesco, J., Miller, L., & Körding, K. (2008). Inferring functional connections between neurons.

*Current Opinion in Neurobiology, 18*(6), 582–588.Stevenson, I., Rebesco, J., Hatsopoulos, N., Haga, Z., Miller, L., & Körding, K. (2009). Bayesian inference of functional connectivity and network structure from spikes.

*IEEE Transactions On Neural Systems And Rehabilitation Engineering, 17*(3), 203.Trong, P., & Rieke, F. (2008). Origin of correlated activity between parasol retinal ganglion cells.

*Nature Neuroscience, 11*(11), 1343–1351.Truccolo, W., Eden, U., Fellows, M., Donoghue, J., & Brown, E. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects.

*Journal of Neurophysiology, 93*, 1074–1089.Usrey, W., & Reid, R. (1999). Synchronous activity in the visual system.

*Annual Review of Physiology, 61*(1), 435–456.Utikal, K. (1997). A new method for detecting neural interconnectivity.

*Biological Cyberkinetics, 76*, 459–470.Van Pelt, J., Vajda, I., Wolters, P., Corner, M., & Ramakers, G. (2005). Dynamics and plasticity in developing neuronal networks

*in vitro*.*Progress in Brain Research, 147*, 173–188.Vidne, M., Kulkarni, J., Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E., et al. (2009). Inferring functional connectivity in an ensemble of retinal ganglion cells sharing a common input. In

*Frontiers in systems neuroscience conference abstract: Computational and systems neuroscience 2009*.Warland, D., Reinagel, P., & Meister, M. (1997). Decoding visual information from a population of retinal ganglion cells.

*Journal of Neurophysiology, 78*, 2336–2350.Wilson, J. M., Dombeck, D. A., Diaz-Rios, M., Harris-Warrick, R. M., & Brownstone, R. M. (2007). Two-photon calcium imaging of network activity in XFP-expressing neurons in the mouse.

*Journal of Neurophysiology, 97*(4), 3118–3125.Wu, W., Kulkarni, J., Hatsopoulos, N., & Paninski, L. (2008). Neural decoding of goal-directed movements using a linear statespace model with hidden states. In

*Computational and systems neuroscience meeting*.Yu, B., Afshar, A., Santhanam, G., Ryu, S., Shenoy, K., & Sahani, M. (2006). Extracting dynamical structure embedded in neural activity. In

*NIPS*.Yu, B., Cunningham, J., Santhanam, G., Ryu, S., Shenoy, K., & Sahani, M. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. In

*NIPS*.Zhang, K., Ginzburg, I., McNaughton, B., & Sejnowski, T. (1998). Interpreting neuronal population activity by reconstruction: Unified framework with application to hippocampal place cells.

*Journal of Neurophysiology, 79*, 1017–1044.

## Acknowledgements

We would like to thank F. Rieke for helpful suggestions and insight; K. Masmoudi for bringing the analogy to the dithering process to our attention; O. Barak, X. Pitkow and M. Greschner for comments on the manuscript; G. D. Field, J. L. Gauthier, and A. Sher for experimental assistance. A preliminary version of this work was presented in Vidne et al. (2009). In addition, an early version of Fig. 2 appeared previously in the review paper (Paninski et al. 2010), and Fig. 11(D) is a reproduction of a schematic figure from (Shlens et al. 2006). This work was supported by: the Gatsby Foundation (M.V.); a Robert Leet and Clara Guthrie Patterson Trust Postdoctoral Fellowship (Y.A.); an NSF Integrative Graduate Education and Research Traineeship Training Grant DGE-0333451 and a Miller Institute Fellowship for Basic Research in Science, UC Berkeley (J.S.); US National Science Foundation grant PHY-0417175 (A.M.L.); NIH Grant EY017736 (E.J.C.); HHMI (E.P.S.); NEI grant EY018003 (E.J.C., L.P. and E.P.S.); and a McKnight Scholar award (L.P.). We also gratefully acknowledge the use of the Hotfoot shared cluster computer at Columbia University.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

**Action Editor: Brent Doiron**

## Appendices

### Appendix A: *O*(*T*) optimization for parameter estimation and stimulus decoding for models with common noise effects

The common-noise model was formulated as conditionally independent cells, with no cross-coupling filters, interacting through the covariance structure of the common noise inputs. Therefore, the model lends itself naturally to parallelization of the computationally expensive stage, the maximum likelihood estimation of the model parameters. For each cell, we need to estimate the model parameters and the common noise the cell receives. The common noise to the cell on a given trial is a time series of the same length of the experiment itself (*T* discrete bins), and finding the maximum a posteriori (MAP) estimate of the common noise is a computationally intensive task because of its high dimensionality, but it can be performed independently on each cell. We used the Newton-Raphson method for optimization over the joint vector, *ν* = [*Θ*,*q*], of the model cell’s parameters and the common-noise estimate. Each iteration in the Newton-Raphson method requires us to find the new stepping direction, *δ*, by solving the linear set of equations, \(H\delta = \nabla\), where \(\nabla\) denotes the gradient and *H* the Hessian matrix (the matrix of second derivatives) of the objective function *F*. We need to solve this matrix equation for *δ* at every step of the optimization. In general, this requires *O*(*T*
^{3}) operations which renders naive approaches inapplicable for long experiments such as the one discussed here. Here we used a *O*(*T*) method for computing the MAP estimate developed in Koyama and Paninski (2010).

Because of the autoregressive structure of the common noise, *q*, the log posterior density of *q*
_{
t
} can be written as:

where we denote the spike train as {*y*
_{
t
}}, and for simplicity we have suppressed the dependence on all other model parameters and taken *q* to be a first order autoregressive process. Since all terms are concave in **q**, the entire expression is concave in **q**.

The Hessian, *H*, has the form:

note that the dimension of \(J = \frac{ \partial ^2F}{ \partial ^2 q} \) is *T* ×*T* (in our case we have a 9.6 min recording and we use 0.8 ms bins, i.e. *T* = 720,000), while the dimension of \(H_{\Theta\Theta} = \frac{ \partial ^2F}{ \partial ^2 \Theta} \) is just *N* ×*N*, where N is the number of parameters in the model (*N* < 20 for all the models considered here). Using the Schur complement of *H* we get:

where we note the Schur complement of *H* as \(S =J - H_{q \Theta} H_{\Theta\Theta}^{-1}H_{\Theta q}\). The dimensions of \(\frac{ \partial ^2F}{ \partial ^2 \Theta} \) are small (number of model parameters, *N*), and because of the autoregressive structure of the common noise, \(J = \frac{ \partial ^2F}{ \partial ^2 q}\), is a tridiagonal matrix:

where

and

In the case that we use higher order autoregressive processes, *D*
_{
t
} and *B*
_{
t
} are replaced by *d* ×*d* blocks where *d* is the order of the AR process. The tridiagonal structure of *J* allows us to obtain each Newton step direction, *δ*, in *O*(*d*
^{3}
*T*) time using standard methods (Rybicki and Hummer 1991; Paninski et al. 2010).

We can also exploit similar bandedness properties for stimulus decoding. Namely, we can carry out the Newton-Raphson method for optimization over the joint vector **ν** = (**u**,**q**), Eq. (11), in *O*(*T*) computational time. According to Eq. (11) (repeated here for convenience),

the Hessian of the log-posterior, **J**, (i.e., the matrix of second partial derivatives of the log-posterior with respect to the components of **ν** = (**u**,**q**)) is:

Thus, the Hessian of the log-posterior is the sum of the Hessian of the log-prior for the filtered stimulus, \({\mathbf{A}} = \frac{\partial^2}{\partial^2\nu} \log p({{\mathbf{u}}}) \), the Hessian, \({\mathbf{B}} = \frac{\partial^2 }{\partial^2\nu} \log p({\mathbf{q}}|{\mathbf{\Theta}}) \), of the log-prior for the common noise, and the Hessian of the log-likelihood, \({\mathbf{D}} = \frac{\partial^2 }{\partial^2\nu} \log p({\mathbf{y}}|{{\mathbf{u}}},{\mathbf{q}},{\mathbf{\Theta}})\). Since we took the stimulus and common noise priors to be Gaussian with zero mean, **A** and **B** are constant matrices, independent of the decoded spike train.

We order the components of **ν** = (**u**,**q**) according to \((u^1_1,u^2_1,q_1,u^1_2,u^2_2,q_2,\cdots )\), where subscripts denote the time step, i.e., such that components corresponding to the same time step are adjacent. With this ordering, **D** is block-diagonal with 3×3 blocks

where \(\mathcal{L}\equiv\log p({\mathbf{y}}|{{\mathbf{u}}},{\mathbf{q}},{\mathbf{\Theta}})\) is the GLM log-likelihood. The contribution of the log-prior for **q** is given by

where, as above, the matrix \(\left [ B_{t_1,t_2}\right ]\) is the Hessian corresponding to the autoregressive process describing **q** (the zero entries in \({\mathbf{b}}_{t_1,t_2}\) involve partial differentiation with respect to the components of **u** which vanish because the log-prior for **q** is independent of **u**). Since *B* is banded, **B** is also banded. Finally, the contribution of the log-prior term for **u** is given by

where the matrix *A* (formed by excluding all the zero rows and columns, corresponding to partial differentiation with respect to the common noise, from **A**) is the inverse covariance matrix of the **u**
^{i}. The covariance matrix of the **u**
_{
i
} is given by \(C_{{{\mathbf{u}}}}^{ij} = {\mathbf{k}}^i C_{{\mathbf{x}}} {\mathbf{k}}^{j\,^{\mathrm{T}}}\) where *C*
_{
x
} is the covariance of the spatio-temporally fluctuating visual stimulus, **x**, and \({\mathbf{k}}^{j\,^{\mathrm{T}}}\) is cell j’s receptive field transposed. Since a white-noise stimulus was used in the experiment, we use \(C_{{\mathbf{x}}} = c^2 \mathbf{I}\), where **I** is the identity matrix and *c* is the stimulus contrast. Hence we have \(C_{{{\mathbf{u}}}}^{ij} = c^2 {\mathbf{k}}^i {\mathbf{k}}^{j\,^{\mathrm{T}}}\), or more explicitly^{Footnote 5}

Notice that since the experimentally fit *k*
^{i} have a finite temporal duration *T*
_{
k
}, the covariance matrix, *C*
_{
u
} is banded: it vanishes when |*t*
_{1} − *t*
_{2}| ≥ 2*T*
_{
k
} − 1. However, the inverse of *C*
_{
u
}, and therefore **A** are not banded in general. This complicates the direct use of the banded matrix methods to solve the set of linear equations, \({\mathbf{J}} \delta= \nabla\), in each Newton-Raphson step. Still, as we will now show, we can exploit the bandedness of *C*
_{
u
} (as well as **B** and **D**) to obtain the desired *O*(*T*) scaling.

In order to solve each Newton-Raphson step we need to recast our problem into an auxiliary space in which all our matrices are tridiagonal. Below we show how such a rotation can be accomplished. First, we calculate the (lower triangular) Cholesky decomposition, *L*, of *C*
_{
u
}, satisfying \(L L^{^{\mathrm{T}}} =C_{{{\mathbf{u}}}}\). Since *C*
_{
u
} is banded, *L* is itself banded, and its calculation can be performed in *O*(*T*) operations (and is performed only once, because *C*
_{
u
} is fixed and does not depend on the vector (**u**,**q**)). Next, we form the 3*T*×3*T* matrix **L**

Clearly **L** is banded because *L* is banded. Also notice that when **L** acts on a state vector it does not affect its **q** part: it corresponds to the *T* ×*T* identity matrix in the **q**-subspace. Let us define

Using the definitions of **L** and **A**, and \(L^{^{\mathrm{T}}} A L = I\) (which follows from \(A = C_{{{\mathbf{u}}}}^{-1}\) and the definition of *L*), we then obtain

where we defined

**I**
_{
u
} is diagonal and **B** is banded, and since **D** is block-diagonal and **L** is banded, so is the third term in Eq. 27. Thus **G** is banded.

Now it is easy to see from Eq. (26) that the solution, *δ*, of the Newton-Raphson equation, \({\mathbf{J}} \delta= \nabla\), can be written as \(\delta = {\mathbf{L}} \tilde{\delta}\) where \(\tilde{\delta}\) is the solution of the auxiliary equation \({\mathbf{G}}\tilde{\delta} = {\mathbf{L}}{^{^{\mathrm{T}}}} \nabla\). Since **G** is banded, this equation can be solved in *O*(*T*) time, and since **L** is banded, the required matrix multiplications by **L** and \({\mathbf{L}}{^{^{\mathrm{T}}}}\) can also be performed in linear computational time. Thus the Newton-Raphson algorithm for solving Eq. 11 can be performed in computational time scaling only linearly with *T*.

### Appendix B: Method of moments

The identification of the correlation structure of a latent variable from the spike-trains of many neurons, and the closely related converse problem of generating spike-trains with a desired correlation structure (Niebur 2007; Krumin and Shoham 2009; Macke et al. 2009; Gutnisky and Josic 2010), has received much attention lately. Krumin and Shoham (2009) generate spike-trains with the desired statistical structure by nonlinearly transforming the underlying nonnegative rate process to a Gaussian processes using a few commonly used link functions. Macke et al. (2009) and Gutnisky and Josic (2010) both use thresholding of a Gaussian process with the desired statistical structure, though these works differ in the way the authors sample the resulting processes. Finally, Dorn and Ringach (2003) proposed a method to find the correlation structure of an underlying Gaussian process in a model where spikes are generated by simple threshold crossing. Here we take a similar route to find the correlation structure between our latent variables, the common noise inputs. However, the correlations in the spike trains in our model stem from both the receptive field overlap as well as the correlations among the latent variable (the common noise inputs) which we have to estimate from the observed spiking data. Moreover, the GLM framework used here affords some additional flexibility, since the spike train including history effects is not restricted to be a Poisson process given the latent variable.

Our model for the full cell population can be written as:

where \(Q \sim \mathcal{N}(0,\mathbf{C}_{\tau}\otimes\mathbf{C}_s)\) since we assume the spatio-temporal covariance structure of the common noise is separable (as discussed in Section 2.2). Here, we separate the covariance of the common noise inputs into two terms: the temporal correlation of each of the common noise sources which imposes the autoregressive structure, **C**
_{
τ
}, and the spatial correlations between the different common inputs to each cell, **C**
_{
s
}. Correlations between cells have two sources in this model. First, the spatio-temporal filters overlap. Second, the correlation matrix **C**
_{
s
} accounts for the common noise correlation. The spatio-temporal filters’ overlap is insufficient to account for the observed correlation structure, as discussed in (Pillow et al. 2008); the RF overlap does not account for the large sharp peak at zero lag in the cross-correlation function computed from the spike trains of neighboring cells. Therefore, we need to capture this fast remaining correlation through **C**
_{
s
}.

We approximated our model as: \(\mathbf{y}^i \sim Poisson(\exp(a^i\mathbf{z}^i+\mathbf{q}^i)dt)\) where **z**
^{i} = **k**
^{i} ·**x**
^{i} + **h**
^{i} ·**y**
^{i}. Since the exponential nonlinearity is a convex, log-concave, increasing function of *z*, so is \(E_{q}\left[\exp(a^i\mathbf{z}^i+\mathbf{q}^i)\right]\) (Paninski 2005). This guarantees that if the distribution of the covariate is elliptically symmetric, then we may consistently estimate the model parameters via the standard GLM maximum likelihood estimator, even if the incorrect nonlinearity is used to compute the likelihood (Paninski 2004), i.e, even when the correct values of the correlation matrix, **C**
_{
s
}, are unknown. Consistency of the model parameter estimation holds up to a scalar constant. Therefore, we leave a scalar degree of freedom, *a*
^{i} in front of **z**
^{i}. We will now write the moments of the distribution analytically and afterwards we will equate them to the observed moments of the spike trains:

The average firing rate can be written as:

and

where \( G^i(a^i) = E[\exp(a^i \mathbf{z}^i)]\) is the moment generating function (Bickel and Doksum 2001) of *a*
^{i}
**z**
^{i} and \( r^i = E[\exp(q^i)] = \exp(\mu^i+\frac{1}{2}(C_s)_{ii})\) may be computed analytically here (by solving the Gaussian integral over \(\exp(q)\)). Therefore we have:

and we can solve for \(a_{opt}^i\).

Now, we can proceed and solve for **C**
_{
s
}. We first rewrite **C**
_{
s
} using the “law of total variance” saying that the total variance of a random variable is the sum of expected conditioned variance plus the variance of the expectations:

The first term in the right hand side can be written as:

since the variance of a Poisson process equals its mean. The second term in the right hand side of Eq. (33) is

Putting them back together, we have

Now we have all the terms for the estimation of **C**
_{
s
}, and we can uniquely invert the relationship to obtain:

where we set \(G^{i,j}(a^i,a^j) = E[\exp(a^i\mathbf{z}^i+a^j\mathbf{z}^j)]\), and we denote the observed covariance of the time-series as \(\hat{\mathbf{C}_s}\).

As a consequence of the law of total variance (recall Eq. (33) above), the Poisson mixture model (a Poisson model where the underlying intensity is itself stochastic) constrains the variance of the spike rate to be larger than the mean rate, while in fact our data is under-dispersed (i.e., the variance is smaller than the mean). This often results in negative eigenvalues in the estimate of **C**
_{
s
}. We therefore need to find the closest approximant to the covariance matrix by finding the positive semidefinite matrix that minimizing the 2-norm distance to **C**
_{
s
} which is a common approach in the literature, though other matrix norms are also possible. Therefore, in order that the smallest eigenvalue is just above zero, we must add a constant equal to the minimum eigenvalue to the diagonal, \(\mathbf{C}_s^{psd} = \mathbf{C}_s+\min(eig(\mathbf{C}_s))\cdot 1\). For more details see (Higham 1988); a similar problem was discussed in (Macke et al. 2009). Finally, note that our estimate of the mixing matrix, *M*, only depends on the estimate of the mean firing rates and correlations in the network. Estimating these quantities accurately does not require exceptionally long data samples; we found that the parameter estimates were stable, approaching their final value even with as little as half the data.

### Appendix C: PSTH-based method

A different method to obtain the covariance structure of the common noise involves analyzing the covariations of the residual activity in the cells once the PSTHs are subtracted (Brody 1999). Let \(y_r^i(t)\) be the spike train of neuron *i* at repeat *r*. Where the the ensemble of cells is presented with a stimulus of duration *T*, *R* times. \(s^i = E_r[y_r^i(t)]\) is the PSTH of neuron *i*. Let

\(\delta y_r^i(t)\) is each neuron’s deviation from the PSTH on each trial. This deviation is unrelated to the stimulus (since we removed the ‘signal’, the PSTH). We next form a matrix of cross-correlations between the deviations of every pair of neurons,

we can cast this matrix into a 2-dimensional matrix, *C*
_{
k
}(*τ*), by denoting *k* = *i* ·*N* + *j*. The matrix, *C*
_{
k
}(*τ*), contains the trial-averaged cross-correlation functions between all cells. It has both the spatial and the temporal information about the covariations.

Using the singular value decomposition one can always decompose a matrix into *C*
_{
k
}(*τ*) = *U*
*ΣV* . Therefore, we can rewrite our matrix of cross-covariations as:

where *U*
_{
i
} is the *i*
^{th} singular vector in the matrix *U* and *V*
_{
i
} is the *i*
^{th} singular vector in the matrix *V* and *σ*
_{
i
} are the singular values. Each matrix, \(\sigma_i U_i \cdot V_i ^ T\), in the sum, is a spatio-temporally separable matrix. Examining the singular values, *σ*
_{
i
}, in Fig. 11(C), we see a clear separation between the first and second singular values; this provides some additional support for the separable nature of the common noise in our model. The first two singular values capture most of the structure of *C*
_{
k
}(*τ*). This means that we can approximate the matrix \(C_k(\tau) \approx \tilde{C}_{k} = \sum_{i = 1}^2 \sigma_i U_i \cdot V_i^T \) (Haykin 2001). In Fig. 11(A), we show the spatial part of the matrix of cross-covariations composed of the first two singular vectors reshaped into matrix form, and in Fig. 11B we show the first two temporal counterparts. Note the distinct separation of the different subpopulations (ON–ON, OFF–OFF, ON–OFF, and OFF–ON) composing the matrix of the entire population in Fig. 11A.

## Rights and permissions

## About this article

### Cite this article

Vidne, M., Ahmadian, Y., Shlens, J. *et al.* Modeling the impact of common noise inputs on the network activity of retinal ganglion cells.
*J Comput Neurosci* **33**, 97–121 (2012). https://doi.org/10.1007/s10827-011-0376-2

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10827-011-0376-2