Abstract
Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope of information contained in these datasets can usually answer much broader questions than what was originally intended. Moreover, many existing big datasets are related to each other but have different detailed specifications, and the mutual information that can be extracted from them collectively has been not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field-specific novel findings which can be revealed from the data without being driven by specific questions or hypotheses. To realise this paradigm, we introduced the binarisation of consensus partition matrices (Bi-CoPaM) method, with the ability of analysing collections of heterogeneous big datasets to identify clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression datasets identified a novel cluster of genes and some new biological hypotheses regarding their function and regulation. In the other application, the analysis of 1,856 big fMRI datasets identified three functionally connected neural networks related to visual, reward and auditory systems during affective processing. These experiments reveal the broad applicability of this paradigm to various fields, and thus encourage exploring the large amounts of partially exploited existing datasets, preferably as collections of related datasets, with a similar approach.
Professor Nandi is a Distinguished Visiting Professor at Tongji University, Shanghai, China. This work was partly supported by the National Science Foundation of China grant number 61520106006 and the National Science Foundation of Shanghai grant number 16JC1401300.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
B. Abu-Jamous, R. Fa, A.K. Nandi, Integrative cluster analysis in bioinformatics, 1st edn. (Wiley, s.l., 2015)
B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery. PLoS ONE 8(2), e56432 (2013)
B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments. J. R. Soc. Interface 10(81), 20120990 (2013)
B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis. BMC Bioinform. 15, 322 (2014)
B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets. BMC Bioinform. 16, 184 (2015)
B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Application of the Bi-CoPaM method to five Escherichia coli datasets generated under various biological conditions. J. Signal Process. Syst. 79(2), 159–166 (2015)
V. Alluri et al., Large-scale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage 59(4), 3677–3689 (2012)
T.L. Bailey, C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers (AAAI Press, Menlo Park, CA, s.n., 1994), pp. 28–36
L. Barrett, T. Wager, The structure of emotion evidence from neuroimaging studies. Curr. Dir. Psychol. Sci. 15(2), 79–83 (2006)
M.C. Bester, D. Jacobson, F.F. Bauer, Many Saccharomyces cerevisiae cell wall protein encoding genes are coregulated by Mss11, but cellular adhesion phenotypes appear only Flo protein dependent. G3 (Bethesda) 2(1), 131–141 (2012)
E. Brattico et al., A functional MRI study of happy and sad emotions in music with and without lyrics. Front. Psychol. 2(December), 308 (2011)
E. Brattico, B. Bogert, T. Jacobsen, Toward a neural chronometry for the aesthetic experience of music. Front. Psychol. 4, 206 (2013)
K.S. Button et al., Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14(5), 365–376 (2013)
S.L. Chin, I.M. Marcus, R.R. Klevecz, C.M. Li, Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome-wide transcriptional oscillators. FEBS J. 279(6), 1119–1130 (2012)
P. Chumnanpuen, I. Nookaew, J. Nielsen, Integrated analysis, transcriptome-lipidome, reveals the effects of INO-level (INO2 and INO4) on lipid metabolism in yeast. BMC Syst. Biol. 7(Suppl 3), S7 (2013)
R. Cook et al., The Saccharomyces cerevisiae transcriptome as a mirror of phytochemical variation in complex extracts of Equisetum arvense from America, China, Europe and India. BMC Genomics 14, 445 (2013)
F. De Martino et al., Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns. NeuroImage 43(1), 44–58 (2008)
D. Dikicioglu et al., How yeast re-programmes its transcriptional profile in response to different nutrient impulses. BMC Syst. Biol. 5, 148–163 (2011)
A. Eklund, T. Nichols, H. Knutsson, Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl. Acad. Sci. 113(28), 7900–7905 (2016)
G. Elliott, A. Timmermann, Forecasting in economics and finance. Annu. Rev. Econ. 8, 81–110 (2016)
R.T. Ferreira et al., Arsenic stress elicits cytosolic Ca(2+) bursts and Crz1 activation in Saccharomyces cerevisiae. Microbiology 158(Pt 9), 2293–2302 (2012)
K.J. Friston et al., Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2(4), 189–210 (1995)
H. Ge et al., Comparative analyses of time-course gene expression profiles of the long-lived sch9Delta mutant. Nucleic Acids Res. 38(1), 143–158 (2010)
C. González-Aguilera et al., Nab2 functions in the metabolism of RNA driven by polymerases II and III. Mol. Biol. Cell 22(15), 2729–2740 (2011)
S. Gupta, J.A. Stamatoyannopolous, T. Bailey, W.S. Noble, Quantifying similarity between motifs. Genome Biol. 8 (2007)
J. Kleinberg, An impossibility theorem for clustering. Adv. Neural Inf. Process. Syst. 446–453 (2002)
S. Koelsch, Towards a neural basis of music-evoked emotions. Trends Cogn. Sci. 14(3), 131–137 (2010)
S. Koelsch, Brain correlates of music-evoked emotions. Nat. Rev. Neurosci. 15(3), 170–180 (2014)
S. Koelsch et al., Investigating emotion with music: an fMRI study. Hum. Brain Mapp. 27(3), 239–250 (2006)
L.A.S. Kovacs et al., Cyclin-dependent kinases are regulators and effectors of oscillations driven by a transcription factor network. Mol. Cell 45(5), 669–679 (2012)
A.M. Lanza, J.J. Blazeck, N.C. Crook, H.S. Alper, Linking yeast Gcn5p catalytic function and gene regulation using a quantitative, graded dominant mutant approach. PLoS ONE 7(4), e36193 (2012)
M. Larsson et al., Functional studies of the yeast med5, med15 and med16 mediator tail subunits. PLoS ONE 8(8), e73137 (2013)
M. Lee, C. Smyser, J. Shimony, Resting-state fMRI: a review of methods and clinical applications. Am. J. Neuroradiol. 34, 1866–1872 (2013)
C. Liu, B. Abu-Jamous, E. Brattico, A.K. Nandi, Towards tunable consensus clustering for studying functional brain connectivity during affective processing. Int. J. Neural Syst. 27(2), 1650042 (2017)
Z. Liu et al., Anaerobic α-amylase production and secretion with fumarate as the final electron acceptor in Saccharomyces cerevisiae. Appl. Environ. Microbiol. 79(9), 2962–2967 (2013)
A.M. Matia-González, M.A. Rodríguez-Gabriel, Slt2 MAPK pathway is essential for cell integrity in the presence of arsenate. Yeast 28(1), 9–17 (2011)
D.J. Montefusco et al., Distinct signaling roles of ceramide species in yeast revealed through systematic perturbation and systems biology analyses. Sci. Signal. 6(299), rs14 (2013)
M. Monti, Statistical analysis of fMRI time-series: a critical review of the GLM approach. Front. Hum. Neurosci. 5(609), 28 (2011)
M. Morillo-Huesca, M. Clemente-Ruiz, E. Andújar, F. Prado, The SWR1 histone replacement complex causes genetic instability and genome-wide transcription misregulation in the absence of H2A.Z. PLoS ONE 5(8), e12143 (2010)
T. Nakamura et al., Identification of a gene, FMP21, whose expression levels are involved in thermotolerance in Saccharomyces cerevisiae. AMB Express 4, 67 (2014)
D.A. Orlando et al., Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature 453, 944–947 (2008)
L.S. Parreiras, L.M. Kohn, J.B. Anderson, Cellular effects and epistasis among three determinants of adaptation in experimental populations of Saccharomyces cerevisiae. Eukaryot. Cell 10(10), 1348–1356 (2011)
E.D. Raj, L.D. Babu, An enhanced trust prediction strategy for online social networks using probabilistic reputation features. Neurocomputing 219, 412–421 (2017)
S. Saarikallio et al., Dance moves reflect current affective state illustrative of approach–avoidance motivation. Psychol. Aesthet. Creat. Arts 7(3), 296–305 (2013)
V. Salimpoor, R. Zatorre, Neural interactions that give rise to musical pleasure. Psychol. Aesth. Creat. Arts 7(1), 62–75 (2013)
A.B. Sanz et al., Chromatin remodeling by the SWI/SNF complex is essential for transcription mediated by the yeast cell wall integrity MAPK pathway. Mol. Biol. Cell 23(14), 2805–2817 (2012)
SGD, Term Finder tool (2014). http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl
K. Strassburg et al., Dynamic transcriptional and metabolic responses in yeast adapting to temperature stress. OMICS 14(3), 249–259 (2010)
A. Strehl, J. Ghosh, Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
T. Suzuki, Y. Iwahashi, Gene expression profiles of yeast Saccharomyces cerevisiae sod1 caused by patulin toxicity and evaluation of recovery potential of ascorbic acid. J. Agric. Food Chem. 59(13), 7145–7154 (2011)
T. Suzuki, Y. Iwahashi, Comprehensive gene expression analysis of type B trichothecenes. J. Agric. Food Chem. 60(37), 9519–9527 (2012)
B. Thirion, G. Varoquaux, E. Dohmatob, J.-B. Poline, Which fMRI clustering gives good brain parcellations? Front. Neurosci. 8, 1–13 (2014)
D. Tomasi, N. Volkow, Abnormal functional connectivity in children with attention-deficit/hyperactivity disorder. Biol. Psychiatry 71(5), 443–450 (2012)
N. Tzourio-Mazoyer et al., Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15(1), 273–289 (2002)
M. van den Heuvel, H. Pol, Exploring the brain network: a review on resting-state fMRI functional connectivity. Eur. Neuropsychopharmacol. 20(8), 519–534 (2010)
S.L. Wade, K. Poorey, S. Bekiranov, D.T. Auble, The Snf1 kinase and proteasome-associated Rad23 regulate UV-responsive gene expression. EMBO J. 28(19), 2919–2931 (2009)
R.W. Wilkins et al., Network science and the effects of music preference on functional brain connectivity: from Beethoven to Eminem. Sci. Rep. 4, 6130 (2014)
Y. Xue-Franzén, J. Henriksson, T.R. Bürglin, A.P. Wright, Distinct roles of the Gcn5 histone acetyltransferase revealed during transient stress-induced reprogramming of the genome. BMC Genom. 14, 479 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Abu-Jamous, B., Liu, C., Roberts, D.J., Brattico, E., Nandi, A.K. (2017). Data-Driven Analysis of Collections of Big Datasets by the Bi-CoPaM Method Yields Field-Specific Novel Insights. In: Prabaharan, S., Thalmann, N., Kanchana Bhaaskaran, V. (eds) Frontiers in Electronic Technologies. Lecture Notes in Electrical Engineering, vol 433. Springer, Singapore. https://doi.org/10.1007/978-981-10-4235-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-4235-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4234-8
Online ISBN: 978-981-10-4235-5
eBook Packages: EngineeringEngineering (R0)