© 2005

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

  • Robert Gentleman
  • Vincent J. Carey
  • Wolfgang Huber
  • Rafael A. Irizarry
  • Sandrine Dudoit


  • Describes R-based packages for bioinformatics and computational biology (BCB) created in the Bioconductor project


Part of the Statistics for Biology and Health book series (SBH)

Table of contents

  1. Front Matter
    Pages i-xix
  2. Preprocessing data from genomic experiments

    1. W. Huber, R. A. Irizarry, R. Gentleman
      Pages 3-12
    2. B. M. Bolstad, R. A. Irizarry, L. Gautier, Z. Wu
      Pages 13-32
    3. B. M. Bolstad, F. Collin, J. Brettschneider, K. Simpson, L. Cope, R. A. Irizarry et al.
      Pages 33-47
    4. Y. H. Yang, A. C. Paquet
      Pages 49-69
    5. W. Huber, F. Hahne
      Pages 71-90
    6. X. Li, R. Gentleman, X. Lu, Q. Shi, J.D. Iglehart, L. Harris et al.
      Pages 91-109
  3. Meta-data: biological annotation and visualization

    1. R. Gentleman, V. J. Carey, J. Zhang
      Pages 113-133
    2. V. J. Carey, D. Temple Lang, J. Gentry, J. Zhang, R. Gentleman
      Pages 135-146
    3. C. A. Smith, W. Huber, R. Gentleman
      Pages 147-160
    4. W. Huber, X. Li, R. Gentleman
      Pages 161-179
  4. Statistical analysis for genomic experiments

    1. V. J. Carey, R. Gentleman
      Pages 183-187
    2. R. Gentleman, B. Ding, S. Dudoit, J. Ibrahim
      Pages 189-208
    3. K. S. Pollard, M. J. van der Laan
      Pages 209-228
    4. D. Scholtens, A. von Heydebreck
      Pages 229-248
    5. K. S. Pollard, S. Dudoit, M. J. van der Laan
      Pages 249-271
    6. T. Hothorn, M. Dettling, P. Bühlmann
      Pages 293-311
  5. Graphs and networks

    1. R. Gentleman, W. Huber, V. J. Carey
      Pages 329-336

About this book


Bioconductor is a widely used open source and open development software project for the analysis and comprehension of data arising from high-throughput experimentation in genomics and molecular biology. Bioconductor is rooted in the open source statistical computing environment R. This volume's coverage is broad and ranges across most of the key capabilities of the Bioconductor project, including

importation and preprocessing of high-throughput data from microarray, proteomic, and flow cytometry platforms

curation and delivery of biological metadata for use in statistical modeling and interpretation

statistical analysis of high-throughput data, including machine learning and visualization,

modeling and visualization of graphs and networks.

The developers of the software, who are in many cases leading academic researchers, jointly authored chapters. All methods are illustrated with publicly available data, and a major section of the book is devoted to exposition of fully worked case studies.

This book is more than a static collection of descriptive text, figures, and code examples that were run by the authors to produce the text; it is a dynamic document. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

Robert Gentleman is Head of the Program in Computational Biology at the Fred Hutchinson Cancer Research Center in Seattle. He is one of the two authors of the original R system and a leading member of the R core team. Vincent Carey is Associate Professor of Medicine (Biostatistics), Channing Laboratory, Brigham and Women's Hospital, Harvard Medical School. Gentleman and Carey are co-founders of the Bioconductor project. Wolfgang Huber is Group Leader in the European Molecular Biology Laboratory at the European Bioinformatics Institute in Cambridge. He has made influential contributions to the error modeling of microarray data. Rafael Irizarry is Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health in Baltimore. He is co-developer of RMA and GCRMA, two of the most popular methodologies for preprocessing high-density oligonucleotide arrays. Sandrine Dudoit is Assistant Professor in the Department of Biostatistics at the University of California, Berkeley. She has made seminal discoveries in the fields of multiple testing and generalized cross-validation and spearheaded the deployment of these findings in applied genomic science.


Annotation DNA Processing bioinformatics biology calculus classification cluster analysis data analysis genes genome genomics machine learning microarray protein

Editors and affiliations

  • Robert Gentleman
    • 1
  • Vincent J. Carey
    • 2
  • Wolfgang Huber
    • 3
  • Rafael A. Irizarry
    • 4
  • Sandrine Dudoit
    • 5
  1. 1.Program in Computational Biology Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattleUSA
  2. 2.Channing Laboratory Brigham and Women’s HospitalHarvard Medical SchoolBostonUSA
  3. 3.European Molecular Biology LaboratoryEuropean Bioinformatics InstituteCambridgeUK
  4. 4.Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreUSA
  5. 5.Division of Biostatistics School of Public HealthUniversity of California BerkeleyBerkeleyUSA

Bibliographic information

Industry Sectors


From the reviews:

"The book has several nice touches that readers will appreciate. First, the liberal use of color shows the full capabilities of Bioconductor pakages and brings the material to life. Second, color figures are dispersed throughout the text rather than being relegated to a central section of color plates. Third, the index indicates whether a term references a package, function or class. This book is an excellent resource... In summary, this book is a must have for any Bioconductor user." (J. Wade Davis, Journal of the American Statistical Association, Vol. 102, No. 477, 2007)

"This book is solid evidence of the influence that quantitative researchers can have on biological investigations. Organized into separate chapters of shared authorship, the book provides a valuable overview of the impact that the authors and their colleagues have had on the analysis of genomic data." (R.W. Doerge, Biostatistics, December 2006)

"This book provides an in-depth demonstration of the potential of the Bioconductor project, through a varied mixture of descriptions, figures and examples. … The book … is an exciting opportunity for researchers to learn directly from the software developers themselves. The range of material covered by the book is diverse and well structured. An abundance of fully worked case studies illustrate the methods in practice. … it should be a must for any researcher considering getting started with the software … ." (Rebecca Walls, Journal of Applied Statistics, Vol. 34 (3), 2007)

"The book provides an extensive overview over the most important tasks in analyzing genomic data with Bioconductor. … The book is well written and communicates hands-on experience of the developers of the respective Bioconductor packages themselves. … The book is targeted to a broad range of researchers interested in genomic data analysis, including biologists, bioinformaticians, and statisticians. … It is a very valuable resource for modern genomic data analysis. There is no comparable book on the market." (Jörg Rahnenführer, Statistical Papers, Vol. 50, 2009)