Genome-Scale Analysis of Data from High-Throughput Technologies

  • Sarah J. Wheelan
Part of the Applied Bioinformatics and Biostatistics in Cancer Research book series (ABB)


Few technical advances have excited such a broad spectrum of basic and clinical scientists as high-throughput technologies (microarrays and sequencing). Having learned in training that somewhere in the genome lies the key to just about any phenotype, scientists are fast joining the movement to decrease cost and improve access to these technologies. Generating enormous amounts of high-dimensional data brings certain challenges, and many researchers are turning even further from their training to collaborate with computer scientists and biostatisticians, who are equally excited to analyze these promising datasets. As new and truly interdisciplinary teams are created, we are seeing major advances; the current environment is exciting for all involved. Technology has brought entire scientific fields to the brink of discovery before, and will again, and thus the overall enthusiasm must be tempered by the fact that new technology brings new problems and new artifacts that we have not seen before. We can circumvent some of these by paying careful attention to experimental design, staying mindful of the complexities of the underlying biology, and by soliciting assistance from analysts versed in high-dimensional data.


Tiling Array Exon Array Matrix Attachment Region Short Read Sequencing Molecular Biology Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37:D26–D31PubMedCrossRefGoogle Scholar
  2. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR (2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308:1149–1154PubMedCrossRefGoogle Scholar
  3. Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, Antonarakis SE (2003) Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs). Science 302:1033–1035PubMedCrossRefGoogle Scholar
  4. El-Mogharbel N, Wakefield M, Deakin JE, Tsend-Ayush E, Grutzner F, Alsop A, Ezaz T, Marshall Graves JA (2007) DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions. Genomics 89:10–21PubMedCrossRefGoogle Scholar
  5. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27:182–189PubMedCrossRefGoogle Scholar
  6. Jones CE, Brown AL, Baumann U (2007) Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics 8:170PubMedCrossRefGoogle Scholar
  7. Kaiser J (2008) DNA sequencing. A plan to capture human diversity in 1000 genomes. Science 319:395PubMedCrossRefGoogle Scholar
  8. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR (2002) Large-scale transcriptional activity in chromosomes 21 and 22. Science 296:916–919PubMedCrossRefGoogle Scholar
  9. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermuller J, Hofacker IL, Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt G, Ganesh M, Ghosh S, Piccolboni A, Sementchenko V, Tammana H, Gingeras TR (2007) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316:1484–1488PubMedCrossRefGoogle Scholar
  10. Marshall A (2008) Prepare for the deluge. Nat Biotechnol 26:1099CrossRefGoogle Scholar
  11. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40:1413–1415PubMedCrossRefGoogle Scholar
  12. Pop M (2009) Genome assembly reborn: recent computational challenges. Brief Bioinform 10:354–366PubMedCrossRefGoogle Scholar
  13. Salzberg SL, Sommer DD, Puiu D, Lee VT (2008) Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Comput Biol 4:e1000186PubMedCrossRefGoogle Scholar
  14. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145PubMedCrossRefGoogle Scholar
  15. Strasser BJ (2008) Genetics. GenBank – Natural history in the 21st Century? Science 322:537–538PubMedCrossRefGoogle Scholar
  16. Trapnell C, Salzberg SL (2009) How to map billions of short reads onto genomes. Nat Biotechnol 27:455–457PubMedCrossRefGoogle Scholar
  17. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–1111PubMedCrossRefGoogle Scholar
  18. Weiss KM (1998) In search of human variation. Genome Res 8:691–697PubMedGoogle Scholar
  19. Wheelan SJ, Scheifele LZ, Martinez-Murillo F, Irizarry RA, Boeke JD (2006) Transposon insertion site profiling chip (TIP-chip). Proc Natl Acad Sci U S A 103:17632–17637PubMedCrossRefGoogle Scholar
  20. Wheelan SJ, Martinez Murillo F, Boeke JD (2008) The incredible shrinking world of DNA microarrays. Mol Biosyst 4:726–732PubMedCrossRefGoogle Scholar
  21. Zeeberg BR, Riss J, Kane DW, Bussey KJ, Uchio E, Linehan WM, Barrett JC, Weinstein JN (2004) Mistaken identifiers: gene name errors can be introduced inadvertently when using Excel in bioinformatics. BMC Bioinformatics 5:80PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Sidney Kimmel Comprehensive Cancer CenterJohns Hopkins University School of MedicineBaltimoreUSA

Personalised recommendations