Skip to main content

Data-Driven Analysis of Collections of Big Datasets by the Bi-CoPaM Method Yields Field-Specific Novel Insights

  • Conference paper
  • First Online:
Frontiers in Electronic Technologies

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 433))

Abstract

Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope of information contained in these datasets can usually answer much broader questions than what was originally intended. Moreover, many existing big datasets are related to each other but have different detailed specifications, and the mutual information that can be extracted from them collectively has been not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field-specific novel findings which can be revealed from the data without being driven by specific questions or hypotheses. To realise this paradigm, we introduced the binarisation of consensus partition matrices (Bi-CoPaM) method, with the ability of analysing collections of heterogeneous big datasets to identify clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression datasets identified a novel cluster of genes and some new biological hypotheses regarding their function and regulation. In the other application, the analysis of 1,856 big fMRI datasets identified three functionally connected neural networks related to visual, reward and auditory systems during affective processing. These experiments reveal the broad applicability of this paradigm to various fields, and thus encourage exploring the large amounts of partially exploited existing datasets, preferably as collections of related datasets, with a similar approach.

Professor Nandi is a Distinguished Visiting Professor at Tongji University, Shanghai, China. This work was partly supported by the National Science Foundation of China grant number 61520106006 and the National Science Foundation of Shanghai grant number 16JC1401300.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. B. Abu-Jamous, R. Fa, A.K. Nandi, Integrative cluster analysis in bioinformatics, 1st edn. (Wiley, s.l., 2015)

    Google Scholar 

  2. B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery. PLoS ONE 8(2), e56432 (2013)

    Article  Google Scholar 

  3. B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments. J. R. Soc. Interface 10(81), 20120990 (2013)

    Article  Google Scholar 

  4. B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis. BMC Bioinform. 15, 322 (2014)

    Article  Google Scholar 

  5. B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets. BMC Bioinform. 16, 184 (2015)

    Article  Google Scholar 

  6. B. Abu-Jamous, R. Fa, D.J. Roberts, A.K. Nandi, Application of the Bi-CoPaM method to five Escherichia coli datasets generated under various biological conditions. J. Signal Process. Syst. 79(2), 159–166 (2015)

    Article  Google Scholar 

  7. V. Alluri et al., Large-scale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage 59(4), 3677–3689 (2012)

    Article  Google Scholar 

  8. T.L. Bailey, C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers (AAAI Press, Menlo Park, CA, s.n., 1994), pp. 28–36

    Google Scholar 

  9. L. Barrett, T. Wager, The structure of emotion evidence from neuroimaging studies. Curr. Dir. Psychol. Sci. 15(2), 79–83 (2006)

    Article  Google Scholar 

  10. M.C. Bester, D. Jacobson, F.F. Bauer, Many Saccharomyces cerevisiae cell wall protein encoding genes are coregulated by Mss11, but cellular adhesion phenotypes appear only Flo protein dependent. G3 (Bethesda) 2(1), 131–141 (2012)

    Google Scholar 

  11. E. Brattico et al., A functional MRI study of happy and sad emotions in music with and without lyrics. Front. Psychol. 2(December), 308 (2011)

    Google Scholar 

  12. E. Brattico, B. Bogert, T. Jacobsen, Toward a neural chronometry for the aesthetic experience of music. Front. Psychol. 4, 206 (2013)

    Article  Google Scholar 

  13. K.S. Button et al., Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14(5), 365–376 (2013)

    Article  Google Scholar 

  14. S.L. Chin, I.M. Marcus, R.R. Klevecz, C.M. Li, Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome-wide transcriptional oscillators. FEBS J. 279(6), 1119–1130 (2012)

    Article  Google Scholar 

  15. P. Chumnanpuen, I. Nookaew, J. Nielsen, Integrated analysis, transcriptome-lipidome, reveals the effects of INO-level (INO2 and INO4) on lipid metabolism in yeast. BMC Syst. Biol. 7(Suppl 3), S7 (2013)

    Article  Google Scholar 

  16. R. Cook et al., The Saccharomyces cerevisiae transcriptome as a mirror of phytochemical variation in complex extracts of Equisetum arvense from America, China, Europe and India. BMC Genomics 14, 445 (2013)

    Article  Google Scholar 

  17. F. De Martino et al., Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns. NeuroImage 43(1), 44–58 (2008)

    Article  Google Scholar 

  18. D. Dikicioglu et al., How yeast re-programmes its transcriptional profile in response to different nutrient impulses. BMC Syst. Biol. 5, 148–163 (2011)

    Google Scholar 

  19. A. Eklund, T. Nichols, H. Knutsson, Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl. Acad. Sci. 113(28), 7900–7905 (2016)

    Article  Google Scholar 

  20. G. Elliott, A. Timmermann, Forecasting in economics and finance. Annu. Rev. Econ. 8, 81–110 (2016)

    Article  Google Scholar 

  21. R.T. Ferreira et al., Arsenic stress elicits cytosolic Ca(2+) bursts and Crz1 activation in Saccharomyces cerevisiae. Microbiology 158(Pt 9), 2293–2302 (2012)

    Article  Google Scholar 

  22. K.J. Friston et al., Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2(4), 189–210 (1995)

    Article  Google Scholar 

  23. H. Ge et al., Comparative analyses of time-course gene expression profiles of the long-lived sch9Delta mutant. Nucleic Acids Res. 38(1), 143–158 (2010)

    Article  Google Scholar 

  24. C. González-Aguilera et al., Nab2 functions in the metabolism of RNA driven by polymerases II and III. Mol. Biol. Cell 22(15), 2729–2740 (2011)

    Article  Google Scholar 

  25. S. Gupta, J.A. Stamatoyannopolous, T. Bailey, W.S. Noble, Quantifying similarity between motifs. Genome Biol. 8 (2007)

    Google Scholar 

  26. J. Kleinberg, An impossibility theorem for clustering. Adv. Neural Inf. Process. Syst. 446–453 (2002)

    Google Scholar 

  27. S. Koelsch, Towards a neural basis of music-evoked emotions. Trends Cogn. Sci. 14(3), 131–137 (2010)

    Article  Google Scholar 

  28. S. Koelsch, Brain correlates of music-evoked emotions. Nat. Rev. Neurosci. 15(3), 170–180 (2014)

    Article  Google Scholar 

  29. S. Koelsch et al., Investigating emotion with music: an fMRI study. Hum. Brain Mapp. 27(3), 239–250 (2006)

    Article  Google Scholar 

  30. L.A.S. Kovacs et al., Cyclin-dependent kinases are regulators and effectors of oscillations driven by a transcription factor network. Mol. Cell 45(5), 669–679 (2012)

    Article  Google Scholar 

  31. A.M. Lanza, J.J. Blazeck, N.C. Crook, H.S. Alper, Linking yeast Gcn5p catalytic function and gene regulation using a quantitative, graded dominant mutant approach. PLoS ONE 7(4), e36193 (2012)

    Article  Google Scholar 

  32. M. Larsson et al., Functional studies of the yeast med5, med15 and med16 mediator tail subunits. PLoS ONE 8(8), e73137 (2013)

    Article  Google Scholar 

  33. M. Lee, C. Smyser, J. Shimony, Resting-state fMRI: a review of methods and clinical applications. Am. J. Neuroradiol. 34, 1866–1872 (2013)

    Article  Google Scholar 

  34. C. Liu, B. Abu-Jamous, E. Brattico, A.K. Nandi, Towards tunable consensus clustering for studying functional brain connectivity during affective processing. Int. J. Neural Syst. 27(2), 1650042 (2017)

    Article  Google Scholar 

  35. Z. Liu et al., Anaerobic α-amylase production and secretion with fumarate as the final electron acceptor in Saccharomyces cerevisiae. Appl. Environ. Microbiol. 79(9), 2962–2967 (2013)

    Article  Google Scholar 

  36. A.M. Matia-González, M.A. Rodríguez-Gabriel, Slt2 MAPK pathway is essential for cell integrity in the presence of arsenate. Yeast 28(1), 9–17 (2011)

    Article  Google Scholar 

  37. D.J. Montefusco et al., Distinct signaling roles of ceramide species in yeast revealed through systematic perturbation and systems biology analyses. Sci. Signal. 6(299), rs14 (2013)

    Google Scholar 

  38. M. Monti, Statistical analysis of fMRI time-series: a critical review of the GLM approach. Front. Hum. Neurosci. 5(609), 28 (2011)

    Google Scholar 

  39. M. Morillo-Huesca, M. Clemente-Ruiz, E. Andújar, F. Prado, The SWR1 histone replacement complex causes genetic instability and genome-wide transcription misregulation in the absence of H2A.Z. PLoS ONE 5(8), e12143 (2010)

    Article  Google Scholar 

  40. T. Nakamura et al., Identification of a gene, FMP21, whose expression levels are involved in thermotolerance in Saccharomyces cerevisiae. AMB Express 4, 67 (2014)

    Article  Google Scholar 

  41. D.A. Orlando et al., Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature 453, 944–947 (2008)

    Article  Google Scholar 

  42. L.S. Parreiras, L.M. Kohn, J.B. Anderson, Cellular effects and epistasis among three determinants of adaptation in experimental populations of Saccharomyces cerevisiae. Eukaryot. Cell 10(10), 1348–1356 (2011)

    Article  Google Scholar 

  43. E.D. Raj, L.D. Babu, An enhanced trust prediction strategy for online social networks using probabilistic reputation features. Neurocomputing 219, 412–421 (2017)

    Article  Google Scholar 

  44. S. Saarikallio et al., Dance moves reflect current affective state illustrative of approach–avoidance motivation. Psychol. Aesthet. Creat. Arts 7(3), 296–305 (2013)

    Article  Google Scholar 

  45. V. Salimpoor, R. Zatorre, Neural interactions that give rise to musical pleasure. Psychol. Aesth. Creat. Arts 7(1), 62–75 (2013)

    Article  Google Scholar 

  46. A.B. Sanz et al., Chromatin remodeling by the SWI/SNF complex is essential for transcription mediated by the yeast cell wall integrity MAPK pathway. Mol. Biol. Cell 23(14), 2805–2817 (2012)

    Article  Google Scholar 

  47. SGD, Term Finder tool (2014). http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl

  48. K. Strassburg et al., Dynamic transcriptional and metabolic responses in yeast adapting to temperature stress. OMICS 14(3), 249–259 (2010)

    Article  Google Scholar 

  49. A. Strehl, J. Ghosh, Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  50. T. Suzuki, Y. Iwahashi, Gene expression profiles of yeast Saccharomyces cerevisiae sod1 caused by patulin toxicity and evaluation of recovery potential of ascorbic acid. J. Agric. Food Chem. 59(13), 7145–7154 (2011)

    Article  Google Scholar 

  51. T. Suzuki, Y. Iwahashi, Comprehensive gene expression analysis of type B trichothecenes. J. Agric. Food Chem. 60(37), 9519–9527 (2012)

    Article  Google Scholar 

  52. B. Thirion, G. Varoquaux, E. Dohmatob, J.-B. Poline, Which fMRI clustering gives good brain parcellations? Front. Neurosci. 8, 1–13 (2014)

    Article  Google Scholar 

  53. D. Tomasi, N. Volkow, Abnormal functional connectivity in children with attention-deficit/hyperactivity disorder. Biol. Psychiatry 71(5), 443–450 (2012)

    Article  Google Scholar 

  54. N. Tzourio-Mazoyer et al., Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15(1), 273–289 (2002)

    Article  Google Scholar 

  55. M. van den Heuvel, H. Pol, Exploring the brain network: a review on resting-state fMRI functional connectivity. Eur. Neuropsychopharmacol. 20(8), 519–534 (2010)

    Article  Google Scholar 

  56. S.L. Wade, K. Poorey, S. Bekiranov, D.T. Auble, The Snf1 kinase and proteasome-associated Rad23 regulate UV-responsive gene expression. EMBO J. 28(19), 2919–2931 (2009)

    Article  Google Scholar 

  57. R.W. Wilkins et al., Network science and the effects of music preference on functional brain connectivity: from Beethoven to Eminem. Sci. Rep. 4, 6130 (2014)

    Article  Google Scholar 

  58. Y. Xue-Franzén, J. Henriksson, T.R. Bürglin, A.P. Wright, Distinct roles of the Gcn5 histone acetyltransferase revealed during transient stress-induced reprogramming of the genome. BMC Genom. 14, 479 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asoke K. Nandi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Abu-Jamous, B., Liu, C., Roberts, D.J., Brattico, E., Nandi, A.K. (2017). Data-Driven Analysis of Collections of Big Datasets by the Bi-CoPaM Method Yields Field-Specific Novel Insights. In: Prabaharan, S., Thalmann, N., Kanchana Bhaaskaran, V. (eds) Frontiers in Electronic Technologies. Lecture Notes in Electrical Engineering, vol 433. Springer, Singapore. https://doi.org/10.1007/978-981-10-4235-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4235-5_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4234-8

  • Online ISBN: 978-981-10-4235-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics