Advertisement

Biophysical Reviews

, Volume 11, Issue 1, pp 41–50 | Cite as

Stems cells, big data and compendium-based analyses for identifying cell types, signalling pathways and gene regulatory networks

  • Md Humayun Kabir
  • Michael D. O’ConnorEmail author
Review
  • 73 Downloads

Abstract

Identification of new drug and cell therapy targets for disease treatment will be facilitated by a detailed molecular understanding of normal and disease development. Human pluripotent stem cells can provide a large in vitro source of human cell types and, in a growing number of instances, also three-dimensional multicellular tissues called organoids. The application of stem cell technology to discovery and development of new therapies will be aided by detailed molecular characterisation of cell identity, cell signalling pathways and target gene networks. Big data or ‘omics’ techniques—particularly transcriptomics and proteomics—facilitate cell and tissue characterisation using thousands to tens-of-thousands of genes or proteins. These gene and protein profiles are analysed using existing and/or emergent bioinformatics methods, including a growing number of methods that compare sample profiles against compendia of reference samples. This review assesses how compendium-based analyses can aid the application of stem cell technology for new therapy development. This includes via robust definition of differentiated stem cell identity, as well as elucidation of complex signalling pathways and target gene networks involved in normal and diseased states.

Keywords

Pluripotent stem cell Bioinformatics Compendium Signalling Growth factor Pathway Gene regulatory network 

Notes

Author contributions

M.H.K drafted the manuscript. M.H.K and M.D.O’C revised and approved the manuscript.

Funding

M.H.K was supported by WSU Postgraduate Research Awards. M.D.O’C was supported by The Medical Advances Without Animals Trust.

Compliance with ethical standards

Conflict of interest

Md Humayun Kabir declares that he has no conflict of interest. Michael D. O’Connor declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human or animal subjects performed by any of the authors.

References

  1. Andersson R et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507:455–461.  https://doi.org/10.1038/nature12787 Google Scholar
  2. Asp P et al (2011) Genome-wide remodeling of the epigenetic landscape during myogenic differentiation. Proc Natl Acad Sci U S A 108:E149–E158.  https://doi.org/10.1073/pnas.1102223108 Google Scholar
  3. Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (2004) Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14:283–291.  https://doi.org/10.1016/j.sbi.2004.05.004 Google Scholar
  4. Bailey T et al (2013) Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 9:e1003326.  https://doi.org/10.1371/journal.pcbi.1003326 Google Scholar
  5. Banks CJ, Joshi A, Michoel T (2016) Functional transcription factor target discovery via compendia of binding and expression profiles. Sci Rep 6:20649.  https://doi.org/10.1038/srep20649 Google Scholar
  6. Barrett T et al (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41:D991–D995.  https://doi.org/10.1093/nar/gks1193 Google Scholar
  7. Bebek G, Yang J (2007) PathFinder: mining signal transduction pathway segments from protein-protein interaction networks. BMC Bioinformatics 8:335.  https://doi.org/10.1186/1471-2105-8-335 Google Scholar
  8. Beer MA, Tavazoie S (2004) Predicting gene expression from sequence. Cell 117:185–198Google Scholar
  9. Berg J (2016) Gene-environment interplay. Science 354:15.  https://doi.org/10.1126/science.aal0219 Google Scholar
  10. Boeva V (2016) Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic. Cells Front Genet 7:24.  https://doi.org/10.3389/fgene.2016.00024 Google Scholar
  11. Boyer LA et al (2005) Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122:947–956.  https://doi.org/10.1016/j.cell.2005.08.020 Google Scholar
  12. Bumgarner R (2013) Overview of DNA microarrays: types, applications, and their future. Curr Protoc Mol Biol Chapter 22:Unit 22 21.  https://doi.org/10.1002/0471142727.mb2201s101 Google Scholar
  13. Butcher EC, Berg EL, Kunkel EJ (2004) Systems biology in drug discovery. Nat Biotechnol 22:1253–1259.  https://doi.org/10.1038/nbt1017 Google Scholar
  14. Chen K, Rajewsky N (2007) The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet 8:93–103.  https://doi.org/10.1038/nrg1990 Google Scholar
  15. Chen H et al (2015) Reinforcement of STAT3 activity reprogrammes human embryonic stem cells to naive-like pluripotency. Nat Commun 6:7095.  https://doi.org/10.1038/ncomms8095 Google Scholar
  16. Cloonan N et al (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–619.  https://doi.org/10.1038/nmeth.1223 Google Scholar
  17. Cohen SN, Chang AC, Boyer HW, Helling RB (1973) Construction of biologically functional bacterial plasmids in vitro. Proc Natl Acad Sci U S A 70:3240–3244Google Scholar
  18. Collas P (2010) The current state of chromatin immunoprecipitation. Mol Biotechnol 45:87–100.  https://doi.org/10.1007/s12033-009-9239-8 Google Scholar
  19. Consortium F et al (2014) A promoter-level mammalian expression atlas. Nature 507:462–470.  https://doi.org/10.1038/nature13182 Google Scholar
  20. Consortium GT (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45:580–585.  https://doi.org/10.1038/ng.2653 Google Scholar
  21. Consortium TEP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74.  https://doi.org/10.1038/nature11247 Google Scholar
  22. Consortium TME (2012) An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol 13:418.  https://doi.org/10.1186/gb-2012-13-8-418 Google Scholar
  23. Consortium TU (2007) The universal protein resource (UniProt). Nucleic Acids Res 35:D193–D197.  https://doi.org/10.1093/nar/gkl929 Google Scholar
  24. Cressey D (2012) Stem cells take root in drug development. Nat NewsGoogle Scholar
  25. Davidson EH et al (2002) A genomic regulatory network for development. Science 295:1669–1678.  https://doi.org/10.1126/science.1069883 Google Scholar
  26. DeFreitas T, Saddiki H, Flaherty P (2016) GEMINI: a computationally-efficient search engine for large gene expression datasets. BMC Bioinf 17:102.  https://doi.org/10.1186/s12859-016-0934-8 Google Scholar
  27. Djordjevic D, Kusumi K, Ho JW (2016) XGSA: a statistical method for cross-species gene set analysis. Bioinformatics 32:i620–i628.  https://doi.org/10.1093/bioinformatics/btw428 Google Scholar
  28. Duggal G et al (2015) Alternative routes to induce naive pluripotency in human embryonic stem cells. Stem Cells 33:2686–2698.  https://doi.org/10.1002/stem.2071 Google Scholar
  29. Engreitz JM, Chen R, Morgan AA, Dudley JT, Mallelwar R, Butte AJ (2011) ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression. Bioinformatics 27:3317–3318.  https://doi.org/10.1093/bioinformatics/btr548 Google Scholar
  30. Fujibuchi W, Kiseleva L, Taniguchi T, Harada H, Horton P (2007) CellMontage: similar expression profile search server. Bioinformatics 23:3103–3104.  https://doi.org/10.1093/bioinformatics/btm462 Google Scholar
  31. Furey TS (2012) ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 13:840–852.  https://doi.org/10.1038/nrg3306 Google Scholar
  32. Germanguz I, Listgarten J, Cinkornpumin J, Solomon A, Gaeta X, Lowry WE (2016) Identifying gene expression modules that define human cell fates. Stem Cell Res 16:712–724.  https://doi.org/10.1016/j.scr.2016.04.008 Google Scholar
  33. Gil DP, Law JN, Murali TM (2017) The PathLinker app: connect the dots in protein interaction networks. F1000Res 6:58.  https://doi.org/10.12688/f1000research.9909.1 Google Scholar
  34. Gitter A, Klein-Seetharaman J, Gupta A, Bar-Joseph Z (2011) Discovering pathways by orienting edges in protein interaction networks. Nucleic Acids Res 39:e22.  https://doi.org/10.1093/nar/gkq1207 Google Scholar
  35. Hackney JA, Moore KA (2005) A functional genomics approach to hematopoietic stem cell regulation. Methods Mol Med 105:439–452Google Scholar
  36. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517.  https://doi.org/10.1093/nar/gki033 Google Scholar
  37. Han X, Aslanian A, Yates JR 3rd (2008) Mass spectrometry for proteomics. Curr Opin Chem Biol 12:483–490.  https://doi.org/10.1016/j.cbpa.2008.07.024 Google Scholar
  38. Hannah R, Joshi A, Wilson NK, Kinston S, Gottgens B (2011) A compendium of genome-wide hematopoietic transcription factor maps supports the identification of gene regulatory control mechanisms. Exp Hematol 39:531–541.  https://doi.org/10.1016/j.exphem.2011.02.009 Google Scholar
  39. Heinz S et al (2010) Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38:576–589.  https://doi.org/10.1016/j.molcel.2010.05.004 Google Scholar
  40. Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG (2007) Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics 23:2692–2699.  https://doi.org/10.1093/bioinformatics/btm403 Google Scholar
  41. Hirst M et al (2007) LongSAGE profiling of nine human embryonic stem cell lines. Genome Biol 8:R113.  https://doi.org/10.1186/gb-2007-8-6-r113 Google Scholar
  42. Hoopes L (2008) Introduction to the gene expression and regulation topic room. Nat Educ 1(1)Google Scholar
  43. Huang DW, Sherman BT, Lempicki RA (2009a) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13.  https://doi.org/10.1093/nar/gkn923 Google Scholar
  44. Huang DW, Sherman BT, Lempicki RA (2009b) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57.  https://doi.org/10.1038/nprot.2008.211 Google Scholar
  45. Janky R et al (2014) iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 10:e1003731.  https://doi.org/10.1371/journal.pcbi.1003731 Google Scholar
  46. Kabir MH, Djordjevic D, O’Connor MD, Ho JWK (2018a) C3: an R package for cross-species compendium-based cell-type identification. Comput Biol Chem 77:187–192Google Scholar
  47. Kabir MH, Murphy P, Lim S, Ho JWK, O’Connor MD (2018b) Large scale profiling of lens epithelial cell signalling pathways and target genes reveals regulatory networks for cataract-associated genes. Exp Eye Res (under review)Google Scholar
  48. Kabir MH, Patrick R, Ho JWK, O’Connor MD (2018c) Identification of active signaling pathways by integrating gene expression and protein interaction data. BMC Syst Biol in pressGoogle Scholar
  49. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30Google Scholar
  50. Kim HD, O'Shea EK (2008) A quantitative model of transcription factor-activated gene expression. Nat Struct Mol Biol 15:1192–1198.  https://doi.org/10.1038/nsmb.1500 Google Scholar
  51. Kuleshov MV et al (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44:W90–W97.  https://doi.org/10.1093/nar/gkw377 Google Scholar
  52. Lee TI et al (2006) Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125:301–313.  https://doi.org/10.1016/j.cell.2006.02.043 Google Scholar
  53. Liu Y, Zhao H (2004) A computational approach for ordering signal transduction pathway components from genomics and proteomics. Data BMC Bioinf 5:158.  https://doi.org/10.1186/1471-2105-5-158 Google Scholar
  54. Marbach D, Lamparter D, Quon G, Kellis M, Kutalik Z, Bergmann S (2016) Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods 13:366–370.  https://doi.org/10.1038/nmeth.3799 Google Scholar
  55. Mardis ER (2007) ChIP-seq: welcome to the new frontier. Nat Methods 4:613–614.  https://doi.org/10.1038/nmeth0807-613 Google Scholar
  56. Medina I et al (2010) Babelomics: an integrative platform for the analysis of transcriptomics, proteomics and genomic data with advanced functional profiling. Nucleic Acids Res 38:W210–W213.  https://doi.org/10.1093/nar/gkq388 Google Scholar
  57. Mei S, Zhu H (2015) Multi-label multi-instance transfer learning for simultaneous reconstruction and cross-talk modeling of multiple human signaling pathways. BMC Bioinf 16:417.  https://doi.org/10.1186/s12859-015-0841-4 Google Scholar
  58. Murphy P et al (2018) Light-focusing human micro-lenses generated from pluripotent stem cells model lens development and drug-induced cataract in vitro. Development 145.  https://doi.org/10.1242/dev.155838
  59. O'Connor MD (2013) The 3R principle: advancing clinical application of human pluripotent stem cells. Stem Cell Res Ther 4:21.  https://doi.org/10.1186/scrt169 Google Scholar
  60. O'Connor MD, Kardel MD, Eaves CJ (2011a) Functional assays for human embryonic stem cell pluripotency. Methods Mol Biol 690:67–80.  https://doi.org/10.1007/978-1-60761-962-8_4 Google Scholar
  61. O'Connor MD et al (2011b) Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp Hematol 39:866–879 e861.  https://doi.org/10.1016/j.exphem.2011.05.008 Google Scholar
  62. Pinto JP, Reddy Kalathur RK, Machado RS, Xavier JM, Braganca J, Futschik ME (2014) StemCellNet: an interactive platform for network-oriented investigations in stem cell biology. Nucleic Acids Res 42:W154–W160.  https://doi.org/10.1093/nar/gku455 Google Scholar
  63. Rackham OJ et al (2016) A predictive computational framework for direct reprogramming between human cell types. Nat Genet 48:331–335.  https://doi.org/10.1038/ng.3487 Google Scholar
  64. Ralston A, Shaw K (2008) Gene expression regulates cell differentiation. Nat Educ 1(1)Google Scholar
  65. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38:500–501.  https://doi.org/10.1038/ng0506-500 Google Scholar
  66. Respuela P, Nikolic M, Tan M, Frommolt P, Zhao Y, Wysocka J, Rada-Iglesias A (2016) Foxd3 promotes exit from naive pluripotency through enhancer decommissioning and inhibits germline specification cell. Stem Cell 18:118–133.  https://doi.org/10.1016/j.stem.2015.09.010 Google Scholar
  67. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47.  https://doi.org/10.1093/nar/gkv007 Google Scholar
  68. Ritz A et al (2016) Pathways on demand: automated reconstruction of human signaling networks. NPJ Syst Biol Appl 2:16002.  https://doi.org/10.1038/npjsba.2016.2 Google Scholar
  69. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140.  https://doi.org/10.1093/bioinformatics/btp616 Google Scholar
  70. Roider HG, Manke T, O'Keeffe S, Vingron M, Haas SA (2009) PASTAA: identifying transcription factors associated with sets of co-regulated genes. Bioinformatics 25:435–442.  https://doi.org/10.1093/bioinformatics/btn627 Google Scholar
  71. Ruau D et al (2013) Building an ENCODE-style data compendium on a shoestring. Nat Methods 10:926.  https://doi.org/10.1038/nmeth.2643 Google Scholar
  72. Scott J, Ideker T, Karp RM, Sharan R (2006) Efficient algorithms for detecting signaling pathways in protein interaction networks. J Comput Biol 13:133–144Google Scholar
  73. Shanks N, Greek R, Greek J (2009) Are animal models predictive for humans? Philos Ethics Humanit Med 4:2.  https://doi.org/10.1186/1747-5341-4-2 Google Scholar
  74. Sharov AA et al (2008) Identification of Pou5f1, Sox2, and Nanog downstream target genes with statistical confidence by applying a novel algorithm to time course microarray and genome-wide chromatin immunoprecipitation data. BMC Genomics 9:269.  https://doi.org/10.1186/1471-2164-9-269 Google Scholar
  75. Shiels A, Bennett TM, Hejtmancik JF (2010) Cat-Map: putting cataract on the map. Mol Vis 16:2007–2015Google Scholar
  76. Spitz F, Furlong EE (2012) Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13:613–626.  https://doi.org/10.1038/nrg3207 Google Scholar
  77. Steffen M, Petti A, Aach J, D'Haeseleer P, Church G (2002) Automated modelling of signal transduction networks. BMC Bioinf 3:34Google Scholar
  78. Tuncbag N et al (2013) Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. J Comput Biol 20:124–136.  https://doi.org/10.1089/cmb.2012.0092 Google Scholar
  79. Ungrin M, O'Connor M, Eaves C, Zandstra PW (2007) Phenotypic analysis of human embryonic stem cells. Curr Protoc Stem Cell Biol Chapter 1:Unit 1B 3.  https://doi.org/10.1002/9780470151808.sc01b03s2 Google Scholar
  80. Van der Jeught M et al (2015) Application of small molecules favoring naive pluripotency during human embryonic stem cell derivation. Cell Reprogram 17:170–180.  https://doi.org/10.1089/cell.2014.0085 Google Scholar
  81. von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res 31:258–261Google Scholar
  82. Wang K et al (2011) CASCADE_SCAN: mining signal transduction network from high-throughput data based on steepest descent method. BMC Bioinf 12:164.  https://doi.org/10.1186/1471-2105-12-164 Google Scholar
  83. Warrier S et al (2017) Direct comparison of distinct naive pluripotent states in human embryonic stem cells. Nat Commun 8:15055.  https://doi.org/10.1038/ncomms15055 Google Scholar
  84. Zacher B, Michel M, Schwalb B, Cramer P, Tresch A, Gagneur J (2017) Accurate promoter and enhancer identification in 127 ENCODE and roadmap epigenomics cell types and tissues by GenoSTAN. PLoS One 12:e0169249.  https://doi.org/10.1371/journal.pone.0169249 Google Scholar
  85. Zhang L, Mallick BK (2013) Inferring gene networks from discrete expression data. Biostatistics 14:708–722.  https://doi.org/10.1093/biostatistics/kxt021 Google Scholar
  86. Zhang S, Cao J, Kong YM, Scheuermann RH (2010) GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach. Bioinformatics 26:905–911.  https://doi.org/10.1093/bioinformatics/btq059 Google Scholar
  87. Zhao XM, Li S (2017) HISP: a hybrid intelligent approach for identifying directed signaling pathways. J Mol Cell Biol 9:453–462.  https://doi.org/10.1093/jmcb/mjx054 Google Scholar
  88. Zhao XM, Wang RS, Chen L, Aihara K (2008) Uncovering signal transduction networks from high-throughput data by integer linear programming. Nucleic Acids Res 36:e48.  https://doi.org/10.1093/nar/gkn145 Google Scholar
  89. Zinman GE, Naiman S, Kanfi Y, Cohen H, Bar-Joseph Z (2013) ExpressionBlast: mining large, unstructured expression databases. Nat Methods 10:925–926.  https://doi.org/10.1038/nmeth.2630 Google Scholar

Copyright information

© International Union for Pure and Applied Biophysics (IUPAB) and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of MedicineWestern Sydney UniversityCampbelltownAustralia
  2. 2.Department of Computer Science and EngineeringUniversity of RajshahiRajshahiBangladesh
  3. 3.Medical Sciences Research GroupWestern Sydney UniversityCampbelltownAustralia

Personalised recommendations