Skip to main content

First Connectomics Challenge: From Imaging to Connectivity

  • Chapter
  • First Online:
Book cover Neural Connectomics Challenge

Abstract

We organized a Challenge to unravel the connectivity of simulated neuronal networks. The provided data was solely based on fluorescence time series of spontaneous activity in a network constituted by 1000 neurons. The task of the participants was to compute the effective connectivity between neurons, with the goal to reconstruct as accurately as possible the ground truth topology of the network. The procured dataset is similar to the one measured in in vivo and in vitro recordings of calcium fluorescence imaging, and therefore the algorithms developed by the participants may largely contribute in the future to unravel major topological features of living neuronal networks from just the analysis of recorded data, and without the need of slow, painstaking experimental connectivity labeling methods. Among 143 entrants, 16 teams participated in the final round of the challenge to compete for prizes. The winners significantly outperformed the baseline method provided by the organizers. To measure influences between neurons the participants used an array of diverse methods, including transfer entropy, regression algorithms, correlation, deep learning, and network deconvolution. The development of “connectivity reconstruction” techniques is a major step in brain science, with many ramifications in the comprehension of neuronal computation, as well as the understanding of network dysfunctions in neuropathologies.

Editors: Demian Battaglia, Isabelle Guyon, Vincent Lemaire, Javier Orlandi, Bisakha Ray, Jordi Soriano

The original form of this article appears in JMLR W&CP Volume 46.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Understood as the “average clustering coefficient” in network theory, i.e. the number of triangles a neuron forms with its neighbors over the total number of triangles it could form given its connectivity.

  2. 2.

    The AUC is computed by integrating the ROC curve.

  3. 3.

    http://tinyurl.com/connectomicsDatasheet.

References

  • Lionel Barnett, Adam B Barrett, and Anil K Seth. Granger causality and transfer entropy are equivalent for gaussian variables. Physical review letters, 103(23):238701, 2009.

    Article  Google Scholar 

  • P Bonifazi, M Goldin, M A Picardo, I Jorquera, A Cattani, G Bianconi, a Represa, Y Ben-Ari, and R Cossart. GABAergic hub neurons orchestrate synchrony in developing hippocampal networks. Science (New York, N.Y.), 326(5958):1419–24, December 2009. ISSN 1095-9203.

    Google Scholar 

  • Y-Lan Boureau, Jean Ponce, and Yann LeCun. A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 111–118, 2010.

    Google Scholar 

  • Andrew P Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7):1145–1159, 1997.

    Article  Google Scholar 

  • Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

    Article  Google Scholar 

  • Wojciech M. Czarnecki and Rafal Jozefowicz. Neural connectivity reconstruction from calcium imaging signal using random forest with topological features. JMLR, proceedings track, This volume, 2014.

    Google Scholar 

  • Ildefons Magrans de Abril and Ann Nowe. Supervised neural network structure recovery. JMLR, proceedings track, This volume, 2014.

    Google Scholar 

  • Alberto De La Fuente, Nan Bing, Ina Hoeschele, and Pedro Mendes. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20(18):3565–3574, 2004.

    Article  Google Scholar 

  • J Eckmann, O Feinerman, L Gruendlinger, E Moses, J Soriano, and T Tlusty. The physics of living neural networks. Physics Reports, 449(1-3):54–76, September 2007. ISSN 03701573.

    Google Scholar 

  • Soheil Feizi, Daniel Marbach, Muriel Médard, and Manolis Kellis. Network deconvolution as a general method to distinguish direct dependencies in networks. Nature biotechnology, 2013.

    Google Scholar 

  • Yoav Freund and Robert E Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, pages 23–37. Springer, 1995.

    Google Scholar 

  • Marc-Oliver Gewaltig and Markus Diesmann. Nest (neural simulation tool). Scholarpedia, 2(4):1430, 2007.

    Article  Google Scholar 

  • Benjamin F Grewe, Dominik Langer, Hansjörg Kasper, Björn M Kampa, and Fritjof Helmchen. High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision. Nature methods, 7(5):399–405, May 2010. ISSN 1548-7105.

    Google Scholar 

  • Christine Grienberger and Arthur Konnerth. Imaging calcium in neurons. Neuron, 73(5):862–885, 2012.

    Article  Google Scholar 

  • Sten Grillner. Megascience efforts and the brain. Neuron, 82(6):1209–11, June 2014. ISSN 1097-4199.

    Google Scholar 

  • Isabelle Guyon, Demian Battaglia, Alice Guyon, Vincent Lemaire, Javier G Orlandi, Mehreen Saeed, Jordi Soriano, Alexander Statnikov, Olav Stetter, and Bisakha Ray. Design of the first neuronal connectomics challenge: From imaging to connectivity. Neural Networks (IJCNN), 2014 International Joint Conference on, pages 2600–2607, July 2014.

    Google Scholar 

  • Eric R Kandel, Henry Markram, Paul M Matthews, Rafael Yuste, and Christof Koch. Neuroscience thinks big (and collaboratively). Nature reviews. Neuroscience, 14(9):659–64, September 2013.

    Article  Google Scholar 

  • Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

    Article  Google Scholar 

  • Andy Liaw and Matthew Wiener. Classification and regression by randomforest. R news, 2(3):18–22, 2002.

    Google Scholar 

  • Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 807–814, 2010.

    Google Scholar 

  • Kenichi Ohki, Sooyoung Chung, Yeang H Ch’ng, Prakash Kara, and R Clay Reid. Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature, 433(7026):597–603, February 2005. ISSN 1476-4687.

    Google Scholar 

  • Javier G. Orlandi, Jordi Soriano, Enrique Alvarez-Lacalle, Sara Teller, and Jaume Casademunt. Noise focusing and the emergence of coherent activity in neuronal cultures. Nature Physics, 9(9):582–590, 2013.

    Article  Google Scholar 

  • Javier G Orlandi, Olav Stetter, Jordi Soriano, Theo Geisel, and Demian Battaglia. Transfer entropy reconstruction and labeling of neuronal connections from simulated calcium imaging. PLoS One, 9(6):e98842, 2014.

    Article  Google Scholar 

  • Thomas Panier, Sebastián a Romano, Raphaël Olive, Thomas Pietri, Germán Sumbre, Raphaël Candelier, and Georges Debrégeas. Fast functional imaging of multiple brain regions in intact zebrafish larvae using Selective Plane Illumination Microscopy. Frontiers in neural circuits, 7(April):65, January 2013. ISSN 1662-5110.

    Google Scholar 

  • Boris Teodorovich Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.

    Article  Google Scholar 

  • Greg Ridgeway. Generalized boosted regression models. Documentation on the R Package gbm, version 1· 5, 7, 2006.

    Google Scholar 

  • Lukasz Romaszko. Signal correlation prediction using convolutional neural networks. JMLR, proceedings track, This volume, 2014.

    Google Scholar 

  • Srikanth Ryali, Tianwen Chen, Kaustubh Supekar, and Vinod Menon. Estimation of functional connectivity in fmri data using stability selection-based sparse partial correlation with elastic net penalty. Neuroimage, 59(4):3852–3861, 2012.

    Article  Google Scholar 

  • Juliane Schäfer and Korbinian Strimmer. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology, 4(1), 2005.

    Google Scholar 

  • Thomas Schreiber. Measuring information transfer. Physical review letters, 85(2):461, 2000.

    Article  Google Scholar 

  • Micha E Spira and Aviad Hai. Multi-electrode array technologies for neuroscience and cardiology. Nature nanotechnology, 8(2):83–94, February 2013. ISSN 1748-3395.

    Google Scholar 

  • Olav Stetter, Demian Battaglia, Jordi Soriano, and Theo Geisel. Model-free reconstruction of excitatory neuronal connectivity from calcium imaging signals. PLoS computational biology, 8(8):e1002653, 2012.

    Article  MathSciNet  Google Scholar 

  • Antonio Sutera, Arnaud Joly, Vincent Francois-Lavet, Zixiao Aaron Qiu, Gilles Louppe, Damien Ernst, and Pierre Geurts. Simple connectome inference from partial correlation statistics in calcium imaging. JMLR, proceedings track, This volume, 2014.

    Google Scholar 

  • Chenyang Tao, Wei Lin, and Jianfeng Feng. Reconstruction of excitatory neuronal connectivity via metric score pooling and regularization. JMLR, proceedings track, This volume, 2014.

    Google Scholar 

  • Elisenda Tibau, Miguel Valencia, and Jordi Soriano. Identification of neuronal network properties from the spectral analysis of calcium imaging signals in neuronal cultures. Frontiers in neural circuits, 7(December):199, January 2013. ISSN 1662-5110.

    Google Scholar 

  • Joshua T Vogelstein. OOPSI: A family of optimal optical spike inference algorithms for inferring neural connectivity from population calcium imaging. THE JOHNS HOPKINS UNIVERSITY, 2009.

    Google Scholar 

  • Joshua T Vogelstein, Brendon O Watson, Adam M Packer, Rafael Yuste, Bruno Jedynak, and Liam Paninski. Spike inference from calcium imaging using sequential monte carlo methods. Biophysical journal, 97(2):636–655, 2009.

    Article  Google Scholar 

  • Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393(6684):440–442, June 1998.

    Article  Google Scholar 

  • Jason Weston, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. Deep learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, pages 639–655. Springer, 2012.

    Google Scholar 

  • BC Wheeler and GJ Brewer. Designing neural networks in culture. Proceedings of the IEEE, 98(3), 2010.

    Google Scholar 

  • Rafael Yuste and George M. Church. The New Century of the Brain. Scientific American, 310(3):38–45, February 2014.

    Article  Google Scholar 

Download references

Acknowledgements

This challenge is the result of the collaboration of many people. We are particularly grateful to our advisors and beta-testers who contributed to the challenge website and/or to review this manuscript: Gavin Cawley, Gideon Dror, Hugo-Jair Escalante, Alice Guyon, Sisi Ma, Eric Peskin, Florin Popescu, and Joshua Vogelstein. The challenge was part of the WCCI 2014 and ECML 2014 competition programs. Prizes were donated by Microsoft. The challenge was implemented on the Kaggle platform, with funds provided by the EU FP7 research program “Marie Curie Actions”. This work used computing resources at the High Performance Computing Facility of the Center for Health Informatics and Bioinformatics at the NYU Langone Medical Center.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Javier Orlandi or Bisakha Ray .

Editor information

Editors and Affiliations

Appendices

Appendix A. Challenge Verification Results

  1. 1.

    Winners prize #1 (first place, verified) 500 USD and 1000 USD travel award \(+\) Award certificate

    AAAGV

    The code from the winning team AAAGV, publicly available at https://github.com/asutera/kaggle-connectomics, was run successfully on a desktop PC, it used 7 GB of RAM and it took 30 h to run in single core mode on a 3 GHZ i7 CPU for each dataset. The code is built in Python and only uses standard dependencies. There was a issue with a specific library version but this has been resolved. Also we only need to run 1 script for the whole computation (main.py). From the valid dataset we obtained an AUC of 0.9426 and for the valid dataset and 0.9416 for the test dataset, which are the same as the ones reported in Kaggle.

  2. 2.

    Winners prize #2 (third place, verified) 250 USD and 750 USD travel award \(+\) Award certificate

    Ildefons

    The code from Ildefons team is publicly available at https://github.com/ildefons/connectomics and consists of 6 separate scripts. The following are the time and memory requirements for each of the scripts. The main challenges were installing the required R package gbm and his script makeFeatures.R which needed 128 G. This R script started a MATLAB server in the SGE (Sun Grid Engine) background. We had to execute makeFeatures.R separately for normal-1, normal-2, valid, and test. His code was executed on the standard compute nodes on the cluster. The compute nodes have 2 INTEL CPUs, 16 processing cores, and 128 GB RAM. The statistics for the execution of his code can be found in Table 4.

    The code passed verification successfully. His AUC for the Kaggle submission generated by us is 0.94066. This is better than his leader board score of 0.93900. The difference between the two scores is 0.00166.

  3. 3.

    Winners prize #3 (fourth place, verified) 100 USD and 400 USD travel award \(+\) Award certificate

    Lukasz Romaszko

    The code for Luaksz Romaszko team can be obtained at https://github.com/lr292358/connectomics. The details for Lukasz’s code can be found in Table 5. His solution involved predicting the outcomes eight different times and averaging. All of his code passed verification successfully. The bottlenecks were installing theano (Python module) on the GPU units and gaining access to the GPU units. We have 5 cluster nodes with GPU accelerators. Each node has 1 accelerator. Each GPU has 2496 cores. The accelerator is NVIDIA Tesla Kepler (K20).

    After merging, his score is 0.93931, which is slightly better than his score of 0.93920 on the leader board. The difference between the two is 0.00011 or, in other words, negligible.

Table 4 Memory requirements and time for Ildefons’ code
Table 5 Memory requirements and time for Lukasz’s code

Appendix B. Description of Sample Methods and Sample Code

Matlab: We provide Matlab sample code to:

  • read the data

  • prepare a sample submission

  • visualize data

  • compute the GTE Stetter et al. (2012) coefficient and a few other causal direction coefficients

  • train and test a predictor based on such coefficients.

The Matlab sample code is suitable to get started. We provide a script (challengeFastBaseline) that computes a solution to the challenge (big “valid” and “test” datasets) in a few minutes, on a regular laptop computer. This uses Pearson’s correlation coefficient (Correlation benchmark, AUC = 0.87322 on the public leaderboard). The data are first discretized with a simple method. Using more elaborate discretization methods such as OOPSI may work better. The other network reconstruction methods, including GTE, are not optimized: they are slow and requires a lot of memory.

C \(++\): Network-reconstruction.org provides C\(++\) code which would help participants to:

  • read the data

  • prepare a sample submission

  • compute the GTE coefficient and a few other causal direction coefficients.

Note: The fluorescence matrices for small networks have dimension 179498\(\,\times \,\)100 and of large networks 179500\(\,\times \,\)1000. Even though the GTE code is “optimized” it is still slow and requires 10–12 h of computation for the big 1000 neuron networks on a compute cluster.

Python: We are providing scripts that:

  • read the data

  • discretizes

  • prepare a sample submission using correlation.

One participant also made Python code available.

The baseline network reconstruction method, which we implemented, is described in details in (Stetter et al. 2012). It is based on Generalized Transfer Entropy (GTE), which is an extension of Transfer Entropy first introduced by Schreiber (Schreiber 2000), a measure that quantifies predictive information flow between stationary systems evolving in time. It is given by the Kulback–Leibler divergence between two models of a given time series, conditioned on a given dynamical state of the system, which in the case of fluorescence signals corresponds to the population average. Transfer Entropy captures linear and non-linear interactions between any pair of neurons in the network and is model-free, i.e. it does not require any a priori knowledge on the type of interaction between neurons. Apart from GTE, we have also provided the implementation of cross correlation and two information gain (IG) measures based on entropy and gini for network reconstruction. Cross correlation gives best results when there are zero time delays, which reduces it to a simple correlation coefficient measure. Hence, all these methods treat the data as independent instances/points in space instead of time series data. Another module that we have added to our software kit is a supervised learner, which extracts features from a network whose ground truth values are known and builds a simple linear classifier for learning whether a connection is present between two neurons or not. Currently, the features extracted are GTE, correlation, information gain using gini and information gain using entropy.

Appendix C. Description of the Algorithms of the Winners

We provide a high level description of the method of the top ranking participants provided in their fact sheets.

Team: AAAGV

The key point is building an undirected network through partial correlations, estimated through inverse covariance matrix. As preprocessing they use a combination of low and high pass filters to filter the signals and they try to filter out bursts or peak neural activities. They stress that their main contribution is the preprocessing of the data. The calcium fluorescence signal is generally very noisy due to light scattering artifacts. In the first step, a low pass filter is used to smooth the signal and filter out high frequency noise. To only retain high frequency around spikes, the time series is transformed into its backward difference. A hard-threshold filter is next applied to eliminate small variances and negative values. In a final step, another function is applied to magnify spikes that occur in cases of low global activity.

For inference, this team assumed that the fluorescence of the neurons at each point can be modeled as random variables independently drawn from the same time-invariant joint probability distribution. They then used partial correlation to detect direct associations between neurons and filter out spurious ones. Partial correlation measures contain dependence between variables and has been used for inference in gene regulatory networks De La Fuente et al. (2004); Schäfer and Strimmer (2005).

As the partial correlation matrix is symmetric, this method was not useful in detecting directionality. Some improvement was obtained by choosing an appropriate number of principal components. The method was sensitive to the choice of filter parameters.

Team: Matthias Ossadnik

He uses multivariate logistic regression of inferred spike trains (thresholded derivative signals). Then the scores of the regressive model are fed into a modified AdaBoost Freund and Schapire (1995) classifier together with other information, such as neuronal firing rates.

Team: Ildefons Magrans

Ildefons designed a feature engineering pipeline based on information about connectivity between neurons and optimized for a particular noise level and firing rate between neurons. Instead of using a single connectivity indicator, he optimizes several indicators. As a first step, he used OOPSI, which is based on the sequential Monte-Carlo methods, in his spike inference module. Spikes below a noise-level are treated as background noise and removed. After that, time steps containing spiking activity above the synchronization rate are removed as inter-bursts recordings are more informative for topology reconstruction. As connectivity indicator, he used plain correlation which however did not provide any directionality information. In order to eliminate arbitrary path lengths caused by direct and indirect effects, he used network deconvolution Feizi et al. (2013) which takes into account the entire connectivity matrix. The classifiers he uses with the features generated from correlation are Random Forests Liaw and Wiener (2002) and Gradient Boosting Machines Ridgeway (2006).

This method also could not identify directions of connections and correlation and the singular value decomposition step of network deconvolution had an extremely high computational complexity.

Team: Lukasz8000

Convolutional Neural Networks (CNN) go beyond feed forward neural networks in their ability to identify spatial dependencies and pattern recognition. CNNs recognize smaller patterns or feature maps in each layer eventually generalizing to more complex patterns in subsequent layers. Each convolutional layer is defined by the number and shapes of filters it has alongwith its ability to learn patterns. In addition, max pooling Boureau et al. (2010) is used to reduce the size of the generated feature maps.

He uses a deep convolutional neuronal network LeCun et al. (1998) to learn features of pairs of time-series hinting at the existence of a connection. In addition he also introduces an additional input layer, the average activity of network. Lukasz used preprocessing to retain regions of higher activity conditioned on a particular threshold. These active regions help to detect interdependencies. The other important choice which influenced results was that of an activation function. He used tanh in the first convolutional layer followed by Rectified Linear Unit Nair and Hinton (2010) in the next two layers. To improve the network structure, he used max pooling. Gradient descent was combined with momentum Polyak (1964) and this helped to navigate past local extrema.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Orlandi, J. et al. (2017). First Connectomics Challenge: From Imaging to Connectivity. In: Battaglia, D., Guyon, I., Lemaire, V., Orlandi, J., Ray, B., Soriano, J. (eds) Neural Connectomics Challenge. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-53070-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53070-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53069-7

  • Online ISBN: 978-3-319-53070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics