Abstract
Environmental and clinical settings can host a wide variety of both bacterial species and strains in a single colony but accurate identification of the organisms is difficult. We describe BIB, a probabilistic method for estimating the relative abundances of species or strains contained in mixed samples analyzed by short read high-throughput sequencing. By grouping closely related strains together in clusters, the BIB pipeline is capable of estimating the relative abundances of the clusters contained in a sequencing sample.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balmer O, Tanner M (2011) Prevalence and implications of multiple-strain infections. Lancet Infect Dis 11:868–878
Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612
Brito IL, Alm EJ (2016) Tracking strains in the microbiome: insights from metagenomics and models. Front Microbiol 7:712
Breitwieser FP, Lu J, Salzberg SL (2017) A review of methods and databases for metagenomic classification and assembly. Brief Bioinf https://doi.org/10.1093/bib/bbx120
Sankar A, Malone B, Bayliss SC, Pascoe B, MĂ©ric G, Hitchings MD et al (2016) Bayesian identification of bacterial strains from sequencing data. Microb Genomics 2:e000075
Glaus P, Honkela A, Rattray M (2012) Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28:1721–1728
Hensman J, Papastamoulis P, Glaus P, Honkela A, Rattray M (2015) Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics 31:3881–3889
Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol 16:150
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34:3150–3160
Jiang H, Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–1032
Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500
Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R et al (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95:3140–3145
Cheng L, Connor TR, Sirén J, Aanensen DM, Corander J (2013) Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30:1224–1228
Corander J, Sirén J, Arjas E (2008) Bayesian spatial modeling of genetic population structure. Comput Stat 23:111
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
Acknowledgements
This work was supported by the Academy of Finland [259440 to A.H., 251170 to J.C.].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Mäklin, T., Corander, J., Honkela, A. (2018). Identifying Bacterial Strains from Sequencing Data. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_1
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8561-6_1
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8560-9
Online ISBN: 978-1-4939-8561-6
eBook Packages: Springer Protocols