MOSAiCS-HMM: A Model-Based Approach for Detecting Regions of Histone Modifications from ChIP-Seq Data

Chung, Dongjun; Zhang, Qi; Keleş, Sündüz

doi:10.1007/978-3-319-07212-8_14

Dongjun Chung⁸,
Qi Zhang⁹ &
Sündüz Keleş¹⁰

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

7562 Accesses
1 Citations

Abstract

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) experiments are routinely utilized for studying epigenomics of transcriptional regulation. We review some of the important statistical issues in the analysis of these experiments and extend our previous model for the analysis of ChIP-seq data of transcription factors, named MOSAiCS, with a hidden Markov model architecture (MOSAiCS-HMM). MOSAiCS-HMM provides a model-based approach for modeling read counts in histone modification ChIP-seq experiments and accounts for the spatial dependence in their ChIP-seq profiles. In addition, its R package implementation provides many functionality for summarizing these data and generating files that can be directly uploaded to the UCSC genome browser.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bailey, T., Krajewski, P., Ladunga, I., Lefebvre, C., Li, Q., Liu, T., Madrigal, P., Taslim, C., Zhang, J.: Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Computat. Biol. 9(11), e1003,326 (2013)
Article Google Scholar
Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., Zhao, K.: High-resolution profiling of histone methylations in the human genome. Cell 129(4), 823–837 (2007)
Article Google Scholar
Benjamini, Y., Speed, T.P.: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012)
Article Google Scholar
Bernstein, B.E., Stamatoyannopoulos, J.A., Costello, J.F., Ren, B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M.A., Beaudet, A.L., Ecker, J.R., Farnham, P.J., Hirst, M., Lander, E.S., Mikkelsen, T.S., Thomson, J.A.: The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28(10), 1045–1048 (2010)
Article Google Scholar
Buck, M.J., Lieb, J.D.: ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 84, 349–360 (2004)
Article Google Scholar
Chung, D., Kuan, P.F., Li, B., Sanalkumar, R., Liang, K., Bresnick, E.H., Dewey, C., Keleş, S.: Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-seq data. PLoS Computat. Biol. 7, e1002,111 (2011)
Article Google Scholar
Chung, D., Park, D., Myers, K., Grass, J., Kiley, P., Landick, R., Keleş, S.: dPeak: High resolution identification of transcription factor binding sites from PET and SET ChIP-seq data. PLoS Computat. Biol. 9(10), e1003,246 (2013)
Article Google Scholar
Dohm, J., Lottaz, C., Borodina, T., Himmelbauer, H.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36(16), e105 (2008)
Article Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University press, Cambridge (1998)
Book MATH Google Scholar
ENCODE Project Consortium, Bernstein, B.E., Birney, E., Dunham, I., Green, E.D., Gunter, C., Snyder, M.: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)
Google Scholar
Ernst, J., Kellis, M.: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–25 (2010)
Article Google Scholar
Gentleman, R.C., Carey, V.J., Bates, D.M., others: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004)
Google Scholar
Guo, Y., Mahony, S., Gifford, D.K.: High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Computat. Biol. 8, e1002,638 (2012)
Article Google Scholar
Jang, S.W., Srinivasan, R., Jones, E.A., Sun, G., Keles, S., Krueger, C., Chang, L.W., Nagarajan, R., Svaren, J.: Locus-wide identification of egr2/krox20 regulatory targets in myelin genes. J. Neurochem. 115(6), 1409–1420 (2010)
Article Google Scholar
Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B.: Genome-wide mapping of in vivo protein-DNA interactions. Science 316(5830), 1497–1502 (2007)
Article Google Scholar
Keleş, S.: Mixture modeling for genome-wide localization of transcription factors. Biometrics 63, 10–21 (2007)
Article MATH MathSciNet Google Scholar
Kharchenko, P.V., Tolstorukov, M., Park, P.J.: Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 6, 1351–1359 (2008)
Article Google Scholar
Kuan, P., Chung, D., Pan, G., Thomson, J., Stewart, R., Keleş, S.: A Statistical Framework for the Analysis of ChIP-seq data. J Am. Stat. Assoc. 106(459), 891–903 (2011)
Article MATH Google Scholar
Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K.I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A.J., Hoffman, M.M., Iyer, V.R., Jung, Y.L., Karmakar, S., Kellis, M., Kharchenko, P.V., Li, Q., Liu, T., Liu, X.S., Ma, L., Milosavljevic, A., Myers, R.M., Park, P.J., Pazin, M.J., Perry, M.D., Raha, D., Reddy, T.E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J.A., Tolstorukov, M.Y., White, K.P., Xi, S., Farnham, P.J., Lieb, J.D., Wold, B.J., Snyder, M.: ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22(9), 1813–1831 (2012)
Article Google Scholar
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.10, R25 (2009)
Google Scholar
Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., Lee, W., Mendenhall, E., O’Donovan, A., Presser, A., Russ, C., Xie, X., Meissner, A., Wernig, M., Jaenisch, R., Nusbaum, C., Lander, E.S., Bernstein, B.E.: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)
Article Google Scholar
Myers, K.S., Yan, H., Ong, I.M., Chung, D., Liang, K., Tran, F., Kele, S., Landick, R., Kiley, P.J.: Genome-scale analysis of escherichia coli fnr reveals complex features of transcription factor binding. PLoS Genetics 9(6), e1003,565 (2013)
Article Google Scholar
Nair, N.U., Sahu, A.D., Bucher, P., Moret, B.M.E.: ChIPnorm: A statistical method for normalizing and identifying differential regions in histone modification ChIP-seq libraries. PLoS ONE 7(8), e39,573 (2012)
Article Google Scholar
Newton, M.A., Noueiry, A., Sarkar, D., Ahlquist, P.: Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2), 155–176 (2004)
Article MATH Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Rozowsky, J., Euskirchen, G., Auerbach, R., Zhang, D., Gibson, T., Bjornson, R., Carriero, N., Snyder, M., Gerstein, M.: PeakSeq enables systematic scoring of ChIP-Seq experiments relative to controls. Nat. Biotechnol. 27(1), 66–75 (2009)
Article Google Scholar
Seo, Y.K., Chong, H.K., Infante, A.M., In, S.S., Xie, X., Osborne, T.F.: Genome-wide analysis of SREBP-1 binding in mouse liver chromatin reveals a preference for promoter proximal binding to a new motif. PNAS 106(33), 13,765–13,769 (2009)
Google Scholar
Song, Q., Smith, A.D.: Identifying dispersed epigenomic domains from ChIP-Seq data. Bioinformatics 27(6), 870–871 (2011)
Article Google Scholar
Srinivasan, R., Sun, G., Keles, S., Jones, E.A., Jang, S.W., Krueger, C., Moran, J.J., Svaren, J.: Genome-wide analysis of egr2/sox10 binding in myelinating peripheral nerve. Nucleic Acids Res. 40(14), 6449–6460 (2012)
Article Google Scholar
Strahl, B.D., Allis, C.D.: The language of covalent histone modifications. Nature 403(6765), 41–45 (2000)
Article Google Scholar
Sun, G., Chung, D., Liang, K., Keleş, S.: Statistical analysis of ChIP-seq data with MOSAiCS In: Shomron, N. (ed.) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol. 1038, pp. 193–212. Humana Press, New york (2013)
Google Scholar
Taslim, C., Huang, T., Lin, S.: DIME: R-package for identifying differential ChIP-seq based on an ensemble of mixture models. Bioinformatics 27(11), 1569–70 (2011)
Article Google Scholar
Xing, H., Mo, Y., Liao, W., Zhang, M.Q.: Genome-wide localization of protein-DNA binding and histone modification by a Bayesian change-point method with ChIP-seq data. PLoS Computat. Biol. 8(7), e1002613 (2012)
Article Google Scholar
Xu, H., Wei, C.L., Lin, F., Sung, W.K.: An HMM approach to genome-wide identification of differential histone modification sites form chip-seq data. Bioinformatics 24(20), 2344–2349 (2008)
Article Google Scholar
Zeng, X., Sanalkumar, R., Bresnick, E.H., Li, H., Chang, Q., Keleş, S.: jMOSAiCS: Joint analysis of multiple ChIP-seq datasets. Genome Biol. 14, R38 (2013)
Article Google Scholar
Zhang, X., Robertson, G., Krzywinski, M., Ning, K., Droit, A., Jones, S., Gottardo, R.: PICS: probabilistic inference for ChIP-seq. Biometrics 67(1), 151–163 (2011)
Article MATH MathSciNet Google Scholar
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nussbaum, C., Myers, R.M., Brown, M., Li, W., Liu, X.S.: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9(9), R137 (2008)
Article Google Scholar
Zhang, Z.D., Rozowsky, J., Snyder, M., Chang, J., Gerstein, M.: Modeling ChIP sequencing in silico with applications. PLoS Computat. Biol. 4(8), e1000,158 (2008)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank Professor Colin Dewey of University of Wisconsin, Madison, for providing us with a RSEM-processed version of ENCODE GM12878 RNA-seq data. This research was supported by National Institutes of Health Grants HG007019 and HG003747 to S.K.

Author information

Authors and Affiliations

Department of Biostatistics, Yale School of Public Health, Yale University, 60 College Street, New Haven, CT, 06520, USA
Dongjun Chung
Department of Biostatistics and Medical Informations, School of Public Health and Medicine, University of Wisconsin, 2130C Genetics/Biotechnology Center, 425 Henry Mall, Madison, WI, 53706, USA
Qi Zhang
Departments of Statistics and of Biostatistics and Medical Informatics, University of Wisconsin, 2124 Genetics/Biotechnology Center, 425 Henry Mall, Madison, WI, 53706, USA
Sündüz Keleş

Authors

Dongjun Chung
View author publications
You can also search for this author in PubMed Google Scholar
Qi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sündüz Keleş
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongjun Chung .

Editor information

Editors and Affiliations

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, USA
Somnath Datta
Department of Statistics, Iowa State University, Ames, Iowa, USA
Dan Nettleton

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chung, D., Zhang, Q., Keleş, S. (2014). MOSAiCS-HMM: A Model-Based Approach for Detecting Regions of Histone Modifications from ChIP-Seq Data. In: Datta, S., Nettleton, D. (eds) Statistical Analysis of Next Generation Sequencing Data. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-07212-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-07212-8_14
Published: 17 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07211-1
Online ISBN: 978-3-319-07212-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics