Skip to main content

Modeling the impact of common noise inputs on the network activity of retinal ganglion cells


Synchronized spontaneous firing among retinal ganglion cells (RGCs), on timescales faster than visual responses, has been reported in many studies. Two candidate mechanisms of synchronized firing include direct coupling and shared noisy inputs. In neighboring parasol cells of primate retina, which exhibit rapid synchronized firing that has been studied extensively, recent experimental work indicates that direct electrical or synaptic coupling is weak, but shared synaptic input in the absence of modulated stimuli is strong. However, previous modeling efforts have not accounted for this aspect of firing in the parasol cell population. Here we develop a new model that incorporates the effects of common noise, and apply it to analyze the light responses and synchronized firing of a large, densely-sampled network of over 250 simultaneously recorded parasol cells. We use a generalized linear model in which the spike rate in each cell is determined by the linear combination of the spatio-temporally filtered visual input, the temporally filtered prior spikes of that cell, and unobserved sources representing common noise. The model accurately captures the statistical structure of the spike trains and the encoding of the visual stimulus, without the direct coupling assumption present in previous modeling work. Finally, we examined the problem of decoding the visual stimulus from the spike train given the estimated parameters. The common-noise model produces Bayesian decoding performance as accurate as that of a model with direct coupling, but with significantly more robustness to spike timing perturbations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    It is also useful to recall the relationship between the spatial covariance C s and the mixing matrix M here: as noted above, since the common noise terms q t have unit variance and are independent from the stimulus and from one another, the spatial covariance C s is given by M T M. However, since \(\mathbf{C}_s = (\mathbf{UM})^T(\mathbf{UM})\) for any unitary matrix U, we can not estimate M directly; we can only obtain an estimate of C s . Therefore, we can proceed with the estimation of the spatial covariance C s and use any convenient decomposition of this matrix for our calculations below. (We emphasize the non-uniqueness of M because it is tempting to interpret M as a kind of effective connectivity matrix, and this over-interpretation should be avoided.)

  2. 2.

    Estimating the spatio-temporal filters k i simultaneously while marginalizing over the common noise q is possible but computationally challenging. Therefore we held the shape of k i fixed in this second stage but optimized its Euclidean length ||k i ||2. The resulting model explained the data well, as discussed in Section 3 below.

  3. 3.

    We make one further simplifying approximation in the case of the population model with no direct coupling terms. Here the dimensionality of the common noise terms is extremely large: dim(q) = n cells ×T, where T is the number of timebins in the experimental data. As discussed in Appendix A, direct optimization over q can be performed in \(O(n_{\rm cells}^3 T)\) time, i.e., computational time scaling linearly with T but unfortunately cubically in n cells (see Paninski et al. 2010 for further discussion). This cubic scaling becomes prohibitive for large populations. However, if we make the simplifying approximation that the common noise injected into each RGC is nearly conditionally independent given the observed spikes y when computing the marginal likelihood p(y | Θ, C s ), then the optimizations over the n cells independent noise terms q i can be performed independently, with the total computation time scaling linearly in both n cells and T. We found this approximation to perform reasonably in practice (see Section 3 below), largely because the pairwise correlation in the common-noise terms was always significantly less than one (see Fig. 4 below). No such approximation was necessary in the pairwise case, where the computation always scales as O(T).

  4. 4.

    In many of the three point correlation functions one can notice persistent diagonal structures. If we consider the three point correlation of neuron 1 with the time shifted firing of neurons 2 and 3, the diagonal structure is the sign of a correlation between the time shifted neurons (2 and 3).

  5. 5.

    Since x is binary, strictly speaking, u i is not a Gaussian vector solely described by its covariance. However, because the filters k i have a relatively large spatiotemporal dimension, the components of u i are weighted sums of many independent identically distributed binary random variables, and their prior marginal distributions can be well approximated by Gaussian distributions (see Pillow et al. 2011 and Ahmadian et al. 2011 for further discussion of this point). For this reason, we replaced the true (non-Gaussian) joint prior distribution of y i with a Gaussian distribution with zero mean and covariance Eq. (24).


  1. Agresti, A. (2002). Categorical data analysis. In Wiley series in probability and mathematical statistics.

  2. Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E., Simoncelli, E., & Paninski, L. (2009). A decoder-based spike train metric for analyzing the neural code in the retina. In COSYNE09.

  3. Ahmadian, Y., Pillow, J. W., & Paninski, L. (2011). Efficient Markov chain Monte Carlo methods for decoding neural spike trains. Neural Computation, 23, 46–96.

    PubMed  Article  Google Scholar 

  4. Arnett, D. (1978). Statistical dependence between neighboring retinal ganglion cells in goldfish. Experimental Brain Research, 32(1), 49–53.

    Article  CAS  Google Scholar 

  5. Bickel, P., & Doksum, K. (2001). Mathematical statistics: Basic ideas and selected topics. Prentice Hall.

  6. Brivanlou, I., Warland, D., & Meister, M. (1998). Mechanisms of concerted firing among retinal ganglion cells. Neuron, 20(3), 527–539.

    PubMed  Article  CAS  Google Scholar 

  7. Brody, C. (1999). Correlations without synchrony. Neural Computation, 11, 1537–1551.

    PubMed  Article  Google Scholar 

  8. Cafaro, J., & Rieke, F. (2010). Noise correlations improve response fidelity and stimulus encoding. Nature, 468(7326), 964–967.

    PubMed  Article  CAS  Google Scholar 

  9. Chornoboy, E., Schramm, L., & Karr, A. (1988). Maximum likelihood identification of neural point process systems. Biological Cybernetics, 59, 265–275.

    PubMed  Article  CAS  Google Scholar 

  10. Cocco, S., Leibler, S., & Monasson, R. (2009). Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods. Proceedings of the National Academy of Sciences, 106(33), 14058.

    Article  CAS  Google Scholar 

  11. Cossart, R., Aronov, D., & Yuste, R. (2003). Attractor dynamics of network up states in the neocortex. Nature, 423, 283–288.

    PubMed  Article  CAS  Google Scholar 

  12. Dacey, D., & Brace, S. (1992). A coupled network for parasol but not midget ganglion cells in the primate retina. Visual Neuroscience, 9(3–4), 279–290.

    PubMed  Article  CAS  Google Scholar 

  13. DeVries, S. H. (1999). Correlated firing in rabbit retinal ganglion cells. Journal of Neurophysiology, 81(2), 908–920.

    PubMed  CAS  Google Scholar 

  14. Dombeck, D., Khabbaz, A., Collman, F., Adelman, T., & Tank, D. (2007). Imaging large-scale neural activity with cellular resolution in awake, mobile mice. Neuron, 56(1), 43–57.

    PubMed  Article  CAS  Google Scholar 

  15. Dorn, J. D., & Ringach, D. L. (2003). Estimating membrane voltage correlations from extracellular spike trains. Journal of Neurophysiology, 89(4), 2271–2278.

    PubMed  Article  Google Scholar 

  16. Fahrmeir, L., & Kaufmann, H. (1991). On Kalman filtering, posterior mode estimation and fisher scoring in dynamic exponential family regression. Metrika, 38, 37–60.

    Article  Google Scholar 

  17. Fahrmeir, L., & Tutz, G. (1994). Multivariate statistical modelling based on generalized linear models. Springer.

  18. Field, G., Gauthier, J., Sher, A., Greschner, M., Machado, T., Jepson, L., et al. (2010). Mapping a neural circuit: A complete input-output diagram in the primate retina. Nature, 467, 673–677.

    PubMed  Article  CAS  Google Scholar 

  19. Frechette, E., Sher, A., Grivich, M., Petrusca, D., Litke, A., & Chichilnisky, E. (2005). Fidelity of the ensemble code for visual motion in primate retina. Journal of Neurophysiology, 94(1), 119.

    PubMed  Article  CAS  Google Scholar 

  20. Gauthier, J., Field, G., Sher, A., Greschner, M., Shlens, J., Litke, A., et al. (2009). Receptive fields in primate retina are coordinated to sample visual space more uniformly. PLoS Biology, 7(4), e1000,063.

    Article  Google Scholar 

  21. Greschner, M., Shlens, J., Bakolitsa, C., Field, G., Gauthier, J., Jepson, L., et al. (2011). Correlated firing among major ganglion cell types in primate retina. The Journal of Physiology, 589(1), 75.

    PubMed  Article  CAS  Google Scholar 

  22. Gutnisky, D., & Josic, K. (2010). Generation of spatiotemporally correlated spike trains and local field potentials using a multivariate autoregressive process. Journal of Neurophysiology, 103(5), 2912.

    PubMed  Article  Google Scholar 

  23. Harris, K., Csicsvari, J., Hirase, H., Dragoi, G., & Buzsaki, G. (2003). Organization of cell assemblies in the hippocampus. Nature, 424, 552–556.

    PubMed  Article  CAS  Google Scholar 

  24. Hayes, M. H. (1996). Statistical digital signal processing and modeling. Wiley.

  25. Haykin, S. (2001). Adaptive filter theory. Pearson Education India.

  26. Higham, N. (1988). Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications, 103, 103–118.

    Article  Google Scholar 

  27. Iyengar, S. (2001). The analysis of multiple neural spike trains. In Advances in methodological and applied aspects of probability and statistics, Gordon and Breach (pp. 507–524).

  28. Kass, R., & Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.

    Google Scholar 

  29. Keat, J., Reinagel, P., Reid, R., & Meister, M. (2001). Predicting every spike: A model for the responses of visual neurons. Neuron, 30, 803–817.

    PubMed  Article  CAS  Google Scholar 

  30. Kelly, R., Smith, M., Samonds, J., Kohn, A., Bonds, J., Movshon, A. B., et al. (2007). Comparison of recordings from microelectrode arrays and single electrodes in the visual cortex. Journal of Neuroscience, 27, 261–264.

    PubMed  Article  CAS  Google Scholar 

  31. Kerr, J. N. D., Greenberg, D., & Helmchen, F. (2005). Imaging input and output of neocortical networks in vivo. PNAS, 102(39), 14063–14068.

    PubMed  Article  CAS  Google Scholar 

  32. Koyama, S., & Paninski, L. (2010). Efficient computation of the map path and parameter estimation in integrate-and-fire and more general state-space models. Journal of Computational Neuroscience, 29, 89–105.

    PubMed  Article  Google Scholar 

  33. Krumin, M., & Shoham, S. (2009). Generation of spike trains with controlled auto-and cross-correlation functions. Neural Computation, 21(6), 1642–1664.

    PubMed  Article  Google Scholar 

  34. Kulkarni, J., & Paninski, L. (2007). Common-input models for multiple neural spike-train data. Network: Computation in Neural Systems, 18, 375–407.

    Article  Google Scholar 

  35. Latham, P., & Nirenberg, S. (2005). Synergy, redundancy, and independence in population codes, revisited. The Journal of Neuroscience, 25(21), 5195.

    PubMed  Article  CAS  Google Scholar 

  36. Lawhern, V., Wu, W., Hatsopoulos, N., & Paninski, L. (2010). Population decoding of motor cortical activity using a generalized linear model with hidden states. Journal of Neuroscience Methods, 189(2), 267–280.

    PubMed  Article  Google Scholar 

  37. Litke, A., Bezayiff N., Chichilnisky E., Cunningham W., Dabrowski W., Grillo A., et al. (2004). What does the eye tell the brain? Development of a system for the large scale recording of retinal output activity. IEEE Transactions on Nuclear Science, 51, 1434–1440.

    Article  Google Scholar 

  38. Macke, J., Berens, P., Ecker, A., Tolias, A., & Bethge, M. (2009). Generating spike trains with specified correlation coefficients. Neural Computation, 21, 397–423.

    PubMed  Article  Google Scholar 

  39. MacLean, J., Watson, B., Aaron, G., & Yuste, R. (2005). Internal dynamics determine the cortical response to thalamic stimulation. Neuron, 48, 811–823.

    PubMed  Article  CAS  Google Scholar 

  40. Martignon, L., Deco, G., Laskey, K., Diamond, M., Freiwald, W., & Vaadia, E. (2000). Neural coding: Higher-order temporal patterns in the neuro-statistics of cell assemblies. Neural Computation, 12, 2621–2653.

    PubMed  Article  CAS  Google Scholar 

  41. Masmoudi, K., Antonini, M., & Kornprobst, P. (2010). Encoding and decoding stimuli using a biological realistic model: The non-determinism in spike timings seen as a dither signal. In Proc of research in encoding and decoding of neural ensembles.

  42. Mastronarde, D. (1983). Correlated firing of cat retinal ganglion cells. I. Spontaneously active inputs to x-and y-cells. Journal of Neurophysiology, 49(2), 303.

    PubMed  CAS  Google Scholar 

  43. McCulloch, C., Searle, S., & Neuhaus, J. (2008). Generalized, linear, and mixed models. In Wiley series in probability and statistics.

  44. Meister, M., Lagnado, L., & Baylor, D. (1995). Concerted signaling by retinal ganglion cells. Science, 270(5239), 1207.

    PubMed  Article  CAS  Google Scholar 

  45. Mishchenko, Y., Vogelstein, J., & Paninski, L. (2011). A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. The Annals of Applied Statistics, 5(2B), 1229–1261.

    Article  Google Scholar 

  46. Nicolelis, M., Dimitrov, D., Carmena, J., Crist, R., Lehew, G., Kralik, J., et al. (2003). Chronic, multisite, multielectrode recordings in macaque monkeys. PNAS, 100, 11,041–11,046.

    PubMed  Article  CAS  Google Scholar 

  47. Niebur, E. (2007). Generation of synthetic spike trains with defined pairwise correlations. Neural Computation, 19(7), 1720–1738.

    PubMed  Article  Google Scholar 

  48. Nirenberg, S., Carcieri, S., Jacobs, A., & Latham, P. (2002). Retinal ganglion cells act largely as independent encoders. Nature, 411, 698–701.

    Article  Google Scholar 

  49. Nykamp, D. (2005). Revealing pairwise coupling in linear-nonlinear networks. SIAM Journal on Applied Mathematics, 65, 2005–2032.

    Article  Google Scholar 

  50. Nykamp, D. (2008). Exploiting history-dependent effects to infer network connectivity. SIAM Journal on Applied Mathematics, 68(2), 354–391.

    Article  Google Scholar 

  51. Nykamp, D. (2009). A stimulus-dependent connectivity analysis of neuronal networks. Journal of Mathematical Biology, 59(2), 147–173.

    PubMed  Article  Google Scholar 

  52. Ohki, K., Chung, S., Kara, P., Hübener, M., Bonhoeffer, T., & Reid, R. (2006). Highly ordered arrangement of single neurons in orientation pinwheels. Nature, 442(7105), 925–928.

    PubMed  Article  CAS  Google Scholar 

  53. Okatan, M., Wilson, M., & Brown, E. (2005). Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity. Neural Computation, 17, 1927–1961.

    PubMed  Article  Google Scholar 

  54. Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15, 243–262.

    Article  Google Scholar 

  55. Paninski, L. (2005). Log-concavity results on Gaussian process methods for supervised and unsupervised learning. Advances in Neural Information Processing Systems 17.

  56. Paninski, L., Pillow, J., & Simoncelli, E. (2004). Maximum likelihood estimation of a stochastic integrate-and-fire neural model. Neural Computation, 16, 2533–2561.

    PubMed  Article  Google Scholar 

  57. Paninski, L., Ahmadian, Y., Ferreira, D., Koyama, S., Rahnama Rad, K., Vidne, M., et al. (2010) A new look at state-space models for neural data. Journal of Computational Neuroscience, 29, 107–126.

    PubMed  Article  Google Scholar 

  58. Pillow, J., & Latham, P. (2007). Neural characterization in partially observed populations of spiking neurons. In NIPS.

  59. Pillow, J., Paninski, L., Uzzell, V., Simoncelli, E., & Chichilnisky, E. (2005). Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model. Journal of Neuroscience, 25, 11,003–11,013.

    PubMed  Article  CAS  Google Scholar 

  60. Pillow, J., Shlens, J., Paninski, L., Sher, A., Litke, A., Chichilnisky, E., et al. (2008). Spatiotemporal correlations and visual signaling in a complete neuronal population. Nature, 454, 995–999.

    PubMed  Article  CAS  Google Scholar 

  61. Pillow, J., Ahmadian, Y., & Paninski, L. (2011). Model-based decoding, information estimation, and change-point detection in multi-neuron spike trains. Neural Computation, 23, 1–45.

    PubMed  Article  Google Scholar 

  62. Rigat, F., de Gunst, M., & van Pelt, J. (2006). Bayesian modelling and analysis of spatio-temporal neuronal networks. Bayesian Analysis, 1, 733–764.

    Article  Google Scholar 

  63. de la Rocha, J., Doiron, B., Shea-Brown, E., Josic, K., & Reyes, A. (2007). Correlation between neural spike trains increases with firing rate. Nature, 448, 802–806.

    PubMed  Article  Google Scholar 

  64. Rybicki, G., & Hummer, D. (1991). An accelerated lambda iteration method for multilevel radiative transfer, appendix b: Fast solution for the diagonal elements of the inverse of a tridiagonal matrix. Astronomy and Astrophysics, 245, 171.

    CAS  Google Scholar 

  65. Schneidman, E., Bialek, W., & Berry, M. (2003). Synergy, redundancy, and independence in population codes. The Journal of Neuroscience, 23(37), 11,539.

    PubMed  CAS  Google Scholar 

  66. Schneidman, E., Berry, M., Segev, R., & Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440, 1007–1012.

    PubMed  Article  CAS  Google Scholar 

  67. Schnitzer, M., & Meister, M. (2003). Multineuronal firing patterns in the signal from eye to brain. Neuron, 37, 499–511.

    PubMed  Article  CAS  Google Scholar 

  68. Segev, R., Goodhouse, J., Puchalla, J., & Berry, M. (2004). Recording spikes from a large fraction of the ganglion cells in a retinal patch. Nature Neuroscience, 7, 1154–1161.

    PubMed  Article  CAS  Google Scholar 

  69. Shlens, J., Field, G., Gauthier, J., Grivich, M., Petrusca, D., Sher, A., et al. (2006). The structure of multi-neuron firing patterns in primate retina. The Journal of Neuroscience, 26(32), 8254.

    PubMed  Article  CAS  Google Scholar 

  70. Shlens, J., Field, G. D., Gauthier, J. L., Greschner, M., Sher, A., Litke, A. M., et al. (2009). The structure of large-scale synchronized firing in primate retina. Journal of Neuroscience, 29(15), 5022–5031. doi:10.1523/JNEUROSCI.5187-08.2009

    PubMed  Article  CAS  Google Scholar 

  71. Smith, A., & Brown, E. (2003). Estimating a state-space model from point process observations. Neural Computation, 15, 965–991.

    PubMed  Article  Google Scholar 

  72. Stein, R., Weber, D., Aoyagi, Y., Prochazka, A., Wagenaar, J., Shoham, S., et al. (2004). Coding of position by simultaneously recorded sensory neurones in the cat dorsal root ganglion. The Journal of Physiology, 560(3), 883–896.

    PubMed  Article  CAS  Google Scholar 

  73. Stevenson, I., Rebesco, J., Miller, L., & Körding, K. (2008). Inferring functional connections between neurons. Current Opinion in Neurobiology, 18(6), 582–588.

    PubMed  Article  CAS  Google Scholar 

  74. Stevenson, I., Rebesco, J., Hatsopoulos, N., Haga, Z., Miller, L., & Körding, K. (2009). Bayesian inference of functional connectivity and network structure from spikes. IEEE Transactions On Neural Systems And Rehabilitation Engineering, 17(3), 203.

    PubMed  Article  Google Scholar 

  75. Trong, P., & Rieke, F. (2008). Origin of correlated activity between parasol retinal ganglion cells. Nature Neuroscience, 11(11), 1343–1351.

    PubMed  Article  CAS  Google Scholar 

  76. Truccolo, W., Eden, U., Fellows, M., Donoghue, J., & Brown, E. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. Journal of Neurophysiology, 93, 1074–1089.

    PubMed  Article  Google Scholar 

  77. Usrey, W., & Reid, R. (1999). Synchronous activity in the visual system. Annual Review of Physiology, 61(1), 435–456.

    PubMed  Article  CAS  Google Scholar 

  78. Utikal, K. (1997). A new method for detecting neural interconnectivity. Biological Cyberkinetics, 76, 459–470.

    Article  Google Scholar 

  79. Van Pelt, J., Vajda, I., Wolters, P., Corner, M., & Ramakers, G. (2005). Dynamics and plasticity in developing neuronal networks in vitro. Progress in Brain Research, 147, 173–188.

    PubMed  Google Scholar 

  80. Vidne, M., Kulkarni, J., Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E., et al. (2009). Inferring functional connectivity in an ensemble of retinal ganglion cells sharing a common input. In Frontiers in systems neuroscience conference abstract: Computational and systems neuroscience 2009.

  81. Warland, D., Reinagel, P., & Meister, M. (1997). Decoding visual information from a population of retinal ganglion cells. Journal of Neurophysiology, 78, 2336–2350.

    PubMed  CAS  Google Scholar 

  82. Wilson, J. M., Dombeck, D. A., Diaz-Rios, M., Harris-Warrick, R. M., & Brownstone, R. M. (2007). Two-photon calcium imaging of network activity in XFP-expressing neurons in the mouse. Journal of Neurophysiology, 97(4), 3118–3125.

    PubMed  Article  CAS  Google Scholar 

  83. Wu, W., Kulkarni, J., Hatsopoulos, N., & Paninski, L. (2008). Neural decoding of goal-directed movements using a linear statespace model with hidden states. In Computational and systems neuroscience meeting.

  84. Yu, B., Afshar, A., Santhanam, G., Ryu, S., Shenoy, K., & Sahani, M. (2006). Extracting dynamical structure embedded in neural activity. In NIPS.

  85. Yu, B., Cunningham, J., Santhanam, G., Ryu, S., Shenoy, K., & Sahani, M. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. In NIPS.

  86. Zhang, K., Ginzburg, I., McNaughton, B., & Sejnowski, T. (1998). Interpreting neuronal population activity by reconstruction: Unified framework with application to hippocampal place cells. Journal of Neurophysiology, 79, 1017–1044.

    PubMed  CAS  Google Scholar 

Download references


We would like to thank F. Rieke for helpful suggestions and insight; K. Masmoudi for bringing the analogy to the dithering process to our attention; O. Barak, X. Pitkow and M. Greschner for comments on the manuscript; G. D. Field, J. L. Gauthier, and A. Sher for experimental assistance. A preliminary version of this work was presented in Vidne et al. (2009). In addition, an early version of Fig. 2 appeared previously in the review paper (Paninski et al. 2010), and Fig. 11(D) is a reproduction of a schematic figure from (Shlens et al. 2006). This work was supported by: the Gatsby Foundation (M.V.); a Robert Leet and Clara Guthrie Patterson Trust Postdoctoral Fellowship (Y.A.); an NSF Integrative Graduate Education and Research Traineeship Training Grant DGE-0333451 and a Miller Institute Fellowship for Basic Research in Science, UC Berkeley (J.S.); US National Science Foundation grant PHY-0417175 (A.M.L.); NIH Grant EY017736 (E.J.C.); HHMI (E.P.S.); NEI grant EY018003 (E.J.C., L.P. and E.P.S.); and a McKnight Scholar award (L.P.). We also gratefully acknowledge the use of the Hotfoot shared cluster computer at Columbia University.

Author information



Corresponding author

Correspondence to Michael Vidne.

Additional information

Action Editor: Brent Doiron


Appendix A: O(T) optimization for parameter estimation and stimulus decoding for models with common noise effects

The common-noise model was formulated as conditionally independent cells, with no cross-coupling filters, interacting through the covariance structure of the common noise inputs. Therefore, the model lends itself naturally to parallelization of the computationally expensive stage, the maximum likelihood estimation of the model parameters. For each cell, we need to estimate the model parameters and the common noise the cell receives. The common noise to the cell on a given trial is a time series of the same length of the experiment itself (T discrete bins), and finding the maximum a posteriori (MAP) estimate of the common noise is a computationally intensive task because of its high dimensionality, but it can be performed independently on each cell. We used the Newton-Raphson method for optimization over the joint vector, ν = [Θ,q], of the model cell’s parameters and the common-noise estimate. Each iteration in the Newton-Raphson method requires us to find the new stepping direction, δ, by solving the linear set of equations, \(H\delta = \nabla\), where \(\nabla\) denotes the gradient and H the Hessian matrix (the matrix of second derivatives) of the objective function F. We need to solve this matrix equation for δ at every step of the optimization. In general, this requires O(T 3) operations which renders naive approaches inapplicable for long experiments such as the one discussed here. Here we used a O(T) method for computing the MAP estimate developed in Koyama and Paninski (2010).

Fig. 11

PSTH-based method (A) Approximate spatial covariance matrix of the inferred common noise composed of the first two spatial singular vectors. \(\tilde{C}_{i,j} = \sum_{k = 1}^2 \sigma_k U_k \) (the vectors are reshaped into matrix form). Note the four distinct regions corresponding to the ON–ON, OFF–OFF, ON–OFF, and OFF–ON. (B) First two temporal singular vectors, corresponding to the temporal correlations of the ‘same-type’ (ON–ON and OFF–OFF pairs), and ‘different type’(ON–OFF pairs). Note the asymmetry of the ON–OFF temporal correlations. (C) Relative power of all the singular values. Note that the first two singular values are well separated from the rest, indicating the correlation structure is well approximated by the first two singular vectors. (D) Receptive field centers and the numbering schema. Top in red: ON cells. Bottom in blue: OFF cells. The numbering schema starts at the top left corner and goes column-wise to the right. The ON cells are 1 through 104 and the OFF cells are 105 to 277. As a result, cells that are physically close are usually closely numbered

Because of the autoregressive structure of the common noise, q, the log posterior density of q t can be written as:

$$\begin{array}{lll} F &= & \log p(\{q_t\}|\{y_t\}) = \log p(q_o) \\[4pt] && + \sum\limits_{t = 1}^T \log p(q_{t}|q_{t-1}) + \sum\limits_{t = 1}^T \log p(y_t|q_t) + const, \end{array} $$

where we denote the spike train as {y t }, and for simplicity we have suppressed the dependence on all other model parameters and taken q to be a first order autoregressive process. Since all terms are concave in q, the entire expression is concave in q.

The Hessian, H, has the form:

$$ \mathbf{H} = \left[ \begin{array}{cc} \dfrac{ \partial ^2F}{ \partial ^2 q} & \dfrac{ \partial ^2F}{ \partial q \partial \Theta} \\[12pt] \dfrac{ \partial ^2F}{ \partial \Theta \partial q} & \dfrac{ \partial ^2F}{ \partial ^2 \Theta} \end{array} \right]; $$

note that the dimension of \(J = \frac{ \partial ^2F}{ \partial ^2 q} \) is T ×T (in our case we have a 9.6 min recording and we use 0.8 ms bins, i.e. T = 720,000), while the dimension of \(H_{\Theta\Theta} = \frac{ \partial ^2F}{ \partial ^2 \Theta} \) is just N ×N, where N is the number of parameters in the model (N < 20 for all the models considered here). Using the Schur complement of H we get:

$$ \begin{array}{lll} \mathbf{H^{-1}} = & \left[ \begin{array}{cc} J & H_{q\Theta} \\[2pt] H_{\Theta q} & H_{\Theta\Theta} \end{array} \right]^{-1}\\[4pt] &= \left[ \begin{array}{cc} J^{-1}+ J^{-1}H_{q \Theta}S^{-1}H_{ \Theta q}J & -J^{-1}H_{q \Theta}S^{-1} \\[2pt] -S^{-1}H_{\Theta q}J^{-1} & S^{-1} \end{array} \right] \end{array} $$

where we note the Schur complement of H as \(S =J - H_{q \Theta} H_{\Theta\Theta}^{-1}H_{\Theta q}\). The dimensions of \(\frac{ \partial ^2F}{ \partial ^2 \Theta} \) are small (number of model parameters, N), and because of the autoregressive structure of the common noise, \(J = \frac{ \partial ^2F}{ \partial ^2 q}\), is a tridiagonal matrix:

$$ \mathbf{J} = \left[ \begin{array}{cccccc} D_1 & B_{1,2} & 0 & \ldots & 0 &0 \\[2pt] B_{1,2}^T & D _2 & B_{2,3} & \ldots & 0 &0 \\[2pt] 0 & B_{2,3}^T & D_3 & \ddots & 0 &0 \\[2pt] \vdots & \vdots & \ddots & \ddots & \vdots &0 \\[2pt] 0 & 0 & 0 & B_{T-2,T-1}^T & D_{T-1} &B_{T-1,T} \\[2pt] 0 & 0 & 0 & \ldots & B_{T-1,T}^T & D_T \end{array} \right] $$


$$\begin{array}{lll} D_t &=& \frac{\partial^2}{\partial q_t^2} \log p(y_t|q_t)+ \frac{\partial^2}{\partial q_t^2} \log p(q_t|q_{t-1})\\[4pt] && + \frac{\partial^2}{\partial q_t^2} \log p(q_{t+1}|q_{t}) \end{array} $$


$$ B_{t,t+1} = \frac{\partial^2}{\partial q_t \partial q_{t+1}} \log p(q_{t+1}|q_t). $$

In the case that we use higher order autoregressive processes, D t and B t are replaced by d ×d blocks where d is the order of the AR process. The tridiagonal structure of J allows us to obtain each Newton step direction, δ, in O(d 3 T) time using standard methods (Rybicki and Hummer 1991; Paninski et al. 2010).

We can also exploit similar bandedness properties for stimulus decoding. Namely, we can carry out the Newton-Raphson method for optimization over the joint vector ν = (u,q), Eq. (11), in O(T) computational time. According to Eq. (11) (repeated here for convenience),

$$ \hat{{{\mathbf{u}}}}({\mathbf{y}}) = \arg\max\limits_{{{\mathbf{u}}},{\mathbf{q}}}\, [ \log p({{\mathbf{u}}}) + \log p({\mathbf{q}}|{\mathbf{\Theta}}) + \log p({\mathbf{y}}|{{\mathbf{u}}},{\mathbf{q}},{\mathbf{\Theta}})]. $$

the Hessian of the log-posterior, J, (i.e., the matrix of second partial derivatives of the log-posterior with respect to the components of ν = (u,q)) is:

$${\mathbf{J}} = \frac{\partial^2 \log p({{\mathbf{u}}}) }{\partial^2\nu} + \frac{\partial^2\log p({\mathbf{q}}|{\mathbf{\Theta}}) }{\partial^2\nu} + \frac{\partial^2 \log p({\mathbf{y}}|{{\mathbf{u}}},{\mathbf{q}},{\mathbf{\Theta}}) }{\partial^2\nu}. $$

Thus, the Hessian of the log-posterior is the sum of the Hessian of the log-prior for the filtered stimulus, \({\mathbf{A}} = \frac{\partial^2}{\partial^2\nu} \log p({{\mathbf{u}}}) \), the Hessian, \({\mathbf{B}} = \frac{\partial^2 }{\partial^2\nu} \log p({\mathbf{q}}|{\mathbf{\Theta}}) \), of the log-prior for the common noise, and the Hessian of the log-likelihood, \({\mathbf{D}} = \frac{\partial^2 }{\partial^2\nu} \log p({\mathbf{y}}|{{\mathbf{u}}},{\mathbf{q}},{\mathbf{\Theta}})\). Since we took the stimulus and common noise priors to be Gaussian with zero mean, A and B are constant matrices, independent of the decoded spike train.

We order the components of ν = (u,q) according to \((u^1_1,u^2_1,q_1,u^1_2,u^2_2,q_2,\cdots )\), where subscripts denote the time step, i.e., such that components corresponding to the same time step are adjacent. With this ordering, D is block-diagonal with 3×3 blocks

$$\begin{array}{rll} {\mathbf{D}} &=&\begin{pmatrix} {\mathbf{d}}_{1,1} & \mathbf{0} & \cdots \\ \mathbf{0} & {\mathbf{d}}_{2,2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}, \qquad\qquad\\[4pt] {\mathbf{d}}_{t,t} &=& \begin{pmatrix} \dfrac{\partial\mathcal{L}}{\partial u^1_{t} \partial u^1_{t}} & \dfrac{\partial\mathcal{L}}{\partial u^1_{t} \partial u^2_{t}} & \dfrac{\partial\mathcal{L}}{\partial u^1_{t} \partial q_{t}} \\[10pt] \dfrac{\partial\mathcal{L}}{\partial u^2_{t} \partial u^1_{t}} & \dfrac{\partial\mathcal{L}}{\partial u^2_{t} \partial u^2_{t}} & \dfrac{\partial\mathcal{L}}{\partial u^2_{t} \partial q_{t}} \\[10pt] \dfrac{\partial\mathcal{L}}{\partial q_{t} \partial u^1_{t}} & \dfrac{\partial\mathcal{L}}{\partial q_{t} \partial u^2_{t}} & \dfrac{\partial\mathcal{L}}{\partial q_{t} \partial q_{t}} \end{pmatrix}, \end{array} $$

where \(\mathcal{L}\equiv\log p({\mathbf{y}}|{{\mathbf{u}}},{\mathbf{q}},{\mathbf{\Theta}})\) is the GLM log-likelihood. The contribution of the log-prior for q is given by

$${\mathbf{B}} = \begin{pmatrix} {\mathbf{b}}_{1,1} & {\mathbf{b}}_{1,2} & \cdots \\ {\mathbf{b}}_{2,1} & {\mathbf{b}}_{2,2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}, \qquad {\mathbf{b}}_{t_1,t_2} =\begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & B_{t_1,t_2} \end{pmatrix}, $$

where, as above, the matrix \(\left [ B_{t_1,t_2}\right ]\) is the Hessian corresponding to the autoregressive process describing q (the zero entries in \({\mathbf{b}}_{t_1,t_2}\) involve partial differentiation with respect to the components of u which vanish because the log-prior for q is independent of u). Since B is banded, B is also banded. Finally, the contribution of the log-prior term for u is given by

$$ {\mathbf{A}} =\begin{pmatrix} {\mathbf{a}}_{1,1} & {\mathbf{a}}_{1,2} & \cdots \\ {\mathbf{a}}_{2,1} & {\mathbf{a}}_{2,2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}, \quad {\mathbf{a}}_{t_1,t_2} =\begin{pmatrix} A_{t_1,t_2}^{1,1} & A_{t_1,t_2}^{1,2} & 0 \\ A_{t_1,t_2}^{2,1} & A_{t_1,t_2}^{2,2} & 0 \\ 0 & 0 & 0 \end{pmatrix}, $$

where the matrix A (formed by excluding all the zero rows and columns, corresponding to partial differentiation with respect to the common noise, from A) is the inverse covariance matrix of the u i. The covariance matrix of the u i is given by \(C_{{{\mathbf{u}}}}^{ij} = {\mathbf{k}}^i C_{{\mathbf{x}}} {\mathbf{k}}^{j\,^{\mathrm{T}}}\) where C x is the covariance of the spatio-temporally fluctuating visual stimulus, x, and \({\mathbf{k}}^{j\,^{\mathrm{T}}}\) is cell j’s receptive field transposed. Since a white-noise stimulus was used in the experiment, we use \(C_{{\mathbf{x}}} = c^2 \mathbf{I}\), where I is the identity matrix and c is the stimulus contrast. Hence we have \(C_{{{\mathbf{u}}}}^{ij} = c^2 {\mathbf{k}}^i {\mathbf{k}}^{j\,^{\mathrm{T}}}\), or more explicitlyFootnote 5

$$\left [ C_{{{\mathbf{u}}}}\right ]_{t_1,t_2}^{i,j} \equiv \mathrm{Cov}[u_{t_1}^i ,u^j_{t_2}] = c^2 \sum\limits_{t,n} k^i(t_1-t,n)k^j(t_2 - t,n). $$

Notice that since the experimentally fit k i have a finite temporal duration T k , the covariance matrix, C u is banded: it vanishes when |t 1 − t 2| ≥ 2T k  − 1. However, the inverse of C u , and therefore A are not banded in general. This complicates the direct use of the banded matrix methods to solve the set of linear equations, \({\mathbf{J}} \delta= \nabla\), in each Newton-Raphson step. Still, as we will now show, we can exploit the bandedness of C u (as well as B and D) to obtain the desired O(T) scaling.

In order to solve each Newton-Raphson step we need to recast our problem into an auxiliary space in which all our matrices are tridiagonal. Below we show how such a rotation can be accomplished. First, we calculate the (lower triangular) Cholesky decomposition, L, of C u , satisfying \(L L^{^{\mathrm{T}}} =C_{{{\mathbf{u}}}}\). Since C u is banded, L is itself banded, and its calculation can be performed in O(T) operations (and is performed only once, because C u is fixed and does not depend on the vector (u,q)). Next, we form the 3T×3T matrix L

$$ {\mathbf{L}} =\begin{pmatrix} {\mathbf{l}}_{1,1} & \mathbf{0} & \cdots \\ {\mathbf{l}}_{2,1} & {\mathbf{l}}_{2,2} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}, \qquad\qquad {\mathbf{l}}_{t_1,t_2} =\begin{pmatrix} L_{t_1,t_2}^{1,1} & L_{t_1,t_2}^{1,2} & 0 \\ L_{t_1,t_2}^{2,1} & L_{t_1,t_2}^{2,2} & 0 \\ 0 & 0 & 1 \end{pmatrix}. $$

Clearly L is banded because L is banded. Also notice that when L acts on a state vector it does not affect its q part: it corresponds to the T ×T identity matrix in the q-subspace. Let us define

$${\mathbf{G}} \equiv {\mathbf{L}}^{^{\mathrm{T}}} {\mathbf{J}}{\mathbf{L}}. $$

Using the definitions of L and A, and \(L^{^{\mathrm{T}}} A L = I\) (which follows from \(A = C_{{{\mathbf{u}}}}^{-1}\) and the definition of L), we then obtain

$$ {\mathbf{G}} = \mathbf{I}_{{{\mathbf{u}}}} + {\mathbf{B}} + {\mathbf{L}}^{^{\mathrm{T}}} {\mathbf{D}} {\mathbf{L}}, $$

where we defined

$$ \mathbf{I}_{{{\mathbf{u}}}} =\begin{pmatrix} {\mathbf{i}}_{{{\mathbf{u}}}} & \mathbf{0} & \cdots \\ \mathbf{0} & {\mathbf{i}}_{{{\mathbf{u}}}} & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix}, \qquad\qquad {\mathbf{i}}_{{{\mathbf{u}}}} =\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{pmatrix}. $$

I u is diagonal and B is banded, and since D is block-diagonal and L is banded, so is the third term in Eq. 27. Thus G is banded.

Now it is easy to see from Eq. (26) that the solution, δ, of the Newton-Raphson equation, \({\mathbf{J}} \delta= \nabla\), can be written as \(\delta = {\mathbf{L}} \tilde{\delta}\) where \(\tilde{\delta}\) is the solution of the auxiliary equation \({\mathbf{G}}\tilde{\delta} = {\mathbf{L}}{^{^{\mathrm{T}}}} \nabla\). Since G is banded, this equation can be solved in O(T) time, and since L is banded, the required matrix multiplications by L and \({\mathbf{L}}{^{^{\mathrm{T}}}}\) can also be performed in linear computational time. Thus the Newton-Raphson algorithm for solving Eq. 11 can be performed in computational time scaling only linearly with T.

Appendix B: Method of moments

The identification of the correlation structure of a latent variable from the spike-trains of many neurons, and the closely related converse problem of generating spike-trains with a desired correlation structure (Niebur 2007; Krumin and Shoham 2009; Macke et al. 2009; Gutnisky and Josic 2010), has received much attention lately. Krumin and Shoham (2009) generate spike-trains with the desired statistical structure by nonlinearly transforming the underlying nonnegative rate process to a Gaussian processes using a few commonly used link functions. Macke et al. (2009) and Gutnisky and Josic (2010) both use thresholding of a Gaussian process with the desired statistical structure, though these works differ in the way the authors sample the resulting processes. Finally, Dorn and Ringach (2003) proposed a method to find the correlation structure of an underlying Gaussian process in a model where spikes are generated by simple threshold crossing. Here we take a similar route to find the correlation structure between our latent variables, the common noise inputs. However, the correlations in the spike trains in our model stem from both the receptive field overlap as well as the correlations among the latent variable (the common noise inputs) which we have to estimate from the observed spiking data. Moreover, the GLM framework used here affords some additional flexibility, since the spike train including history effects is not restricted to be a Poisson process given the latent variable.

Our model for the full cell population can be written as:

$$ \mathbf{\lambda} = \exp( \mathbf{k} \cdot \mathbf{x}+\mathbf{h} \cdot \mathbf{y}+Q), $$

where \(Q \sim \mathcal{N}(0,\mathbf{C}_{\tau}\otimes\mathbf{C}_s)\) since we assume the spatio-temporal covariance structure of the common noise is separable (as discussed in Section 2.2). Here, we separate the covariance of the common noise inputs into two terms: the temporal correlation of each of the common noise sources which imposes the autoregressive structure, C τ , and the spatial correlations between the different common inputs to each cell, C s . Correlations between cells have two sources in this model. First, the spatio-temporal filters overlap. Second, the correlation matrix C s accounts for the common noise correlation. The spatio-temporal filters’ overlap is insufficient to account for the observed correlation structure, as discussed in (Pillow et al. 2008); the RF overlap does not account for the large sharp peak at zero lag in the cross-correlation function computed from the spike trains of neighboring cells. Therefore, we need to capture this fast remaining correlation through C s .

We approximated our model as: \(\mathbf{y}^i \sim Poisson(\exp(a^i\mathbf{z}^i+\mathbf{q}^i)dt)\) where z i = k i ·x i + h i ·y i. Since the exponential nonlinearity is a convex, log-concave, increasing function of z, so is \(E_{q}\left[\exp(a^i\mathbf{z}^i+\mathbf{q}^i)\right]\) (Paninski 2005). This guarantees that if the distribution of the covariate is elliptically symmetric, then we may consistently estimate the model parameters via the standard GLM maximum likelihood estimator, even if the incorrect nonlinearity is used to compute the likelihood (Paninski 2004), i.e, even when the correct values of the correlation matrix, C s , are unknown. Consistency of the model parameter estimation holds up to a scalar constant. Therefore, we leave a scalar degree of freedom, a i in front of z i. We will now write the moments of the distribution analytically and afterwards we will equate them to the observed moments of the spike trains:

The average firing rate can be written as:

$$ \begin{array}{lll} E[\mathbf{y}^i ] &= E_{z,\mathbf{q}}\left[\exp(a^i\mathbf{z}^i+\mathbf{q}^i)\right] \\ &=E_{z}\left[\exp(a^i\mathbf{z}^i)\right] E_{\mathbf{q}}\left[\exp(\mathbf{q}^i)\right]\\ &= G^i(a^i)r^i \end{array} $$


$$ \begin{array}{lll} E\left[\mathbf{z}^i\mathbf{y}^i\right] &= E_{z,b}\left[z\exp(a^i\mathbf{z}^i+q^i)\right] \\ &=E_{z}\left[z\exp(a^i\mathbf{z}^i)\right] E_{q}\left[\exp(q^i)\right]\\ &= \frac{d}{da^i}G^i(a^i)r^i \end{array} $$

where \( G^i(a^i) = E[\exp(a^i \mathbf{z}^i)]\) is the moment generating function (Bickel and Doksum 2001) of a i z i and \( r^i = E[\exp(q^i)] = \exp(\mu^i+\frac{1}{2}(C_s)_{ii})\) may be computed analytically here (by solving the Gaussian integral over \(\exp(q)\)). Therefore we have:

$$ \frac{E[\mathbf{z}^i \mathbf{y}^i]}{E[\mathbf{y}^i]} = \frac{\frac{d}{da^i}G_i(a_i) r_i}{G^i(a^i)r^i} = \frac{d}{da^i}\log G_i(a_i) $$

and we can solve for \(a_{opt}^i\).

Now, we can proceed and solve for C s . We first rewrite C s using the “law of total variance” saying that the total variance of a random variable is the sum of expected conditioned variance plus the variance of the expectations:

$$ \begin{array}{lll} \mathbf{C}_s &= E\left[ \mathbf{y}^i\mathbf{y}^j\right] - E\left[ \mathbf{y}^i\right]E\left[ \mathbf{y}^j\right]\\ & = E_{z,\mathbf{q}}[Cov(\mathbf{y}|z,\mathbf{q})] + Cov_{z,\mathbf{q}}(E[\mathbf{y}|z,\mathbf{q}]). \end{array} $$

The first term in the right hand side can be written as:

$$\begin{array}{lll} E_{z,\mathbf{q}}\left[Cov(\mathbf{y}|z,\mathbf{q})\right] &=& E_{z,\mathbf{q}}\left[diag[\exp(a^i\mathbf{z}^i+\mathbf{q}^i)]\right] \\ &=& diag[E[\mathbf{y}^i]], \end{array} $$

since the variance of a Poisson process equals its mean. The second term in the right hand side of Eq. (33) is

$$\begin{array}{lll} && Cov_{z,\mathbf{q}}(E[\mathbf{y}|z,\mathbf{q}])\\ &&E_{z,\mathbf{q}}\left[ E[\mathbf{y}^i|\mathbf{z}^i,\mathbf{q}^i] E[\mathbf{y}^j|\mathbf{z}^j,\mathbf{q}^j] \right]\\ &&E_{z,\mathbf{q}}\left[ E[\mathbf{y}^i|\mathbf{z}^i,\mathbf{q}^i]\right] E_{z,\mathbf{q}}\left[ E[\mathbf{y}^j|\mathbf{z}^j,\mathbf{q}^j] \right]\\ &&E_{z,\mathbf{q}}\left[ \exp(a^i\mathbf{z}^i+\mathbf{q}^i) \exp(a^j\mathbf{z}^j+\mathbf{q}^j) \right]\\ &&E_{z,\mathbf{q}}\left[ \exp(a^i\mathbf{z}^i+\mathbf{q}^i)\right] E_{z,\mathbf{q}}\left[ \exp(a^j\mathbf{z}^j+\mathbf{q}^j) \right]\\ &&E_{z}\left[ \exp(a^i\mathbf{z}^i+a^j\mathbf{z}^j) \right] E_{\mathbf{q}}\left[ \exp(\mathbf{q}^i+\mathbf{q}^j) \right] - E\left[ \mathbf{y}^i \right] E\left[ \mathbf{y}^j \right]\\ &&G^{i,j}(a^i,a^j) \exp \left (\frac{1}{2} \left( C_s^{ii}+C_s^{jj}+2C_s^{ij} \right) \right) \\ &&G^i(a^i)r^iG^j(a^j)r^j \end{array} $$

Putting them back together, we have

$$ \begin{array}{lll} \mathbf{C}_s = & ~ diag\left[E\left[\mathbf{y}^i\right]\right] \\ &+ r^ir^j(G^{i,j}(a^i,a^j)\exp \Big( (C_s)_{i,j}-G^i(a^i)G^j(a^j) \Big). \end{array} $$

Now we have all the terms for the estimation of C s , and we can uniquely invert the relationship to obtain:

$$ \begin{array}{lll} (\mathbf{C}_s)_{i,j} = \log \left( \frac{(\hat{\mathbf{C}_s})_{i,j} - E[\mathbf{y}^i]}{r^ir^j G^{i,j}(a^i,a^j) } + \frac{G^i(a^i)G^j(a^j)}{G^{i,j}(a^i,a^j)} \right) \text{, } \end{array} $$

where we set \(G^{i,j}(a^i,a^j) = E[\exp(a^i\mathbf{z}^i+a^j\mathbf{z}^j)]\), and we denote the observed covariance of the time-series as \(\hat{\mathbf{C}_s}\).

As a consequence of the law of total variance (recall Eq. (33) above), the Poisson mixture model (a Poisson model where the underlying intensity is itself stochastic) constrains the variance of the spike rate to be larger than the mean rate, while in fact our data is under-dispersed (i.e., the variance is smaller than the mean). This often results in negative eigenvalues in the estimate of C s . We therefore need to find the closest approximant to the covariance matrix by finding the positive semidefinite matrix that minimizing the 2-norm distance to C s which is a common approach in the literature, though other matrix norms are also possible. Therefore, in order that the smallest eigenvalue is just above zero, we must add a constant equal to the minimum eigenvalue to the diagonal, \(\mathbf{C}_s^{psd} = \mathbf{C}_s+\min(eig(\mathbf{C}_s))\cdot 1\). For more details see (Higham 1988); a similar problem was discussed in (Macke et al. 2009). Finally, note that our estimate of the mixing matrix, M, only depends on the estimate of the mean firing rates and correlations in the network. Estimating these quantities accurately does not require exceptionally long data samples; we found that the parameter estimates were stable, approaching their final value even with as little as half the data.

Appendix C: PSTH-based method

A different method to obtain the covariance structure of the common noise involves analyzing the covariations of the residual activity in the cells once the PSTHs are subtracted (Brody 1999). Let \(y_r^i(t)\) be the spike train of neuron i at repeat r. Where the the ensemble of cells is presented with a stimulus of duration T, R times. \(s^i = E_r[y_r^i(t)]\) is the PSTH of neuron i. Let

$$ \delta y_r^i(t) = y_r^i(t) - E_r[y_r^i(t)] \text{; ie,} $$

\(\delta y_r^i(t)\) is each neuron’s deviation from the PSTH on each trial. This deviation is unrelated to the stimulus (since we removed the ‘signal’, the PSTH). We next form a matrix of cross-correlations between the deviations of every pair of neurons,

$$\begin{array}{rll} C_{i,j}(\tau) & =& E_r\left[\left(y_r^i\left(t\right) - E_r\left[y_r^i\left(t\right)\right] \right) \right.\\ && \cdot \left.\left(y_r^j\left(t+\tau\right) - E_r\left[y_r^j\left(t+\tau\right)\right] \right) \right] \\ & =& E_r\left[\delta y_r^i(t) \cdot \delta y_r^j(t+\tau)\right] \text{ ; } \end{array} $$

we can cast this matrix into a 2-dimensional matrix, C k (τ), by denoting k = i ·N + j. The matrix, C k (τ), contains the trial-averaged cross-correlation functions between all cells. It has both the spatial and the temporal information about the covariations.

Using the singular value decomposition one can always decompose a matrix into C k (τ) = U ΣV . Therefore, we can rewrite our matrix of cross-covariations as:

$$ \begin{array}{lll} C_k(\tau) = \sum\limits_i \sigma_i U_i \cdot V_i ^ T. \end{array} $$

where U i is the i th singular vector in the matrix U and V i is the i th singular vector in the matrix V and σ i are the singular values. Each matrix, \(\sigma_i U_i \cdot V_i ^ T\), in the sum, is a spatio-temporally separable matrix. Examining the singular values, σ i , in Fig. 11(C), we see a clear separation between the first and second singular values; this provides some additional support for the separable nature of the common noise in our model. The first two singular values capture most of the structure of C k (τ). This means that we can approximate the matrix \(C_k(\tau) \approx \tilde{C}_{k} = \sum_{i = 1}^2 \sigma_i U_i \cdot V_i^T \) (Haykin 2001). In Fig. 11(A), we show the spatial part of the matrix of cross-covariations composed of the first two singular vectors reshaped into matrix form, and in Fig. 11B we show the first two temporal counterparts. Note the distinct separation of the different subpopulations (ON–ON, OFF–OFF, ON–OFF, and OFF–ON) composing the matrix of the entire population in Fig. 11A.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Vidne, M., Ahmadian, Y., Shlens, J. et al. Modeling the impact of common noise inputs on the network activity of retinal ganglion cells. J Comput Neurosci 33, 97–121 (2012).

Download citation


  • Retina
  • Generalized linear model
  • State-space model
  • Multielectrode
  • Recording
  • Random-effects model