Identifying Bacterial Strains from Sequencing Data

Mäklin, Tommi; Corander, Jukka; Honkela, Antti

doi:10.1007/978-1-4939-8561-6_1

Tommi Mäklin³,
Jukka Corander^4,5 &
Antti Honkela^6,7

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1807))

1439 Accesses
1 Altmetric

Abstract

Environmental and clinical settings can host a wide variety of both bacterial species and strains in a single colony but accurate identification of the organisms is difficult. We describe BIB, a probabilistic method for estimating the relative abundances of species or strains contained in mixed samples analyzed by short read high-throughput sequencing. By grouping closely related strains together in clusters, the BIB pipeline is capable of estimating the relative abundances of the clusters contained in a sequencing sample.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Balmer O, Tanner M (2011) Prevalence and implications of multiple-strain infections. Lancet Infect Dis 11:868–878
Google Scholar
Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612
Google Scholar
Brito IL, Alm EJ (2016) Tracking strains in the microbiome: insights from metagenomics and models. Front Microbiol 7:712
Google Scholar
Breitwieser FP, Lu J, Salzberg SL (2017) A review of methods and databases for metagenomic classification and assembly. Brief Bioinf https://doi.org/10.1093/bib/bbx120
Sankar A, Malone B, Bayliss SC, Pascoe B, Méric G, Hitchings MD et al (2016) Bayesian identification of bacterial strains from sequencing data. Microb Genomics 2:e000075
Article Google Scholar
Glaus P, Honkela A, Rattray M (2012) Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28:1721–1728
Article CAS PubMed PubMed Central Google Scholar
Hensman J, Papastamoulis P, Glaus P, Honkela A, Rattray M (2015) Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics 31:3881–3889
PubMed PubMed Central CAS Google Scholar
Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol 16:150
Article CAS PubMed PubMed Central Google Scholar
Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34:3150–3160
Article CAS PubMed PubMed Central Google Scholar
Jiang H, Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–1032
Article CAS PubMed PubMed Central Google Scholar
Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500
Article CAS PubMed Google Scholar
Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R et al (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95:3140–3145
Article CAS PubMed Google Scholar
Cheng L, Connor TR, Sirén J, Aanensen DM, Corander J (2013) Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30:1224–1228
Article CAS PubMed PubMed Central Google Scholar
Corander J, Sirén J, Arjas E (2008) Bayesian spatial modeling of genetic population structure. Comput Stat 23:111
Article Google Scholar
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Academy of Finland [259440 to A.H., 251170 to J.C.].

Author information

Authors and Affiliations

Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Tommi Mäklin
Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Jukka Corander
Department of Biostatistics, University of Oslo, Oslo, Norway
Jukka Corander
Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Antti Honkela
Department of Public Health, University of Helsinki, Helsinki, Finland
Antti Honkela

Authors

Tommi Mäklin
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Corander
View author publications
You can also search for this author in PubMed Google Scholar
Antti Honkela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antti Honkela .

Editor information

Editors and Affiliations

Bioinformatics Center, Kyoto University, Uji, Kyoto, Japan
Hiroshi Mamitsuka

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Mäklin, T., Corander, J., Honkela, A. (2018). Identifying Bacterial Strains from Sequencing Data. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_1

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8561-6_1
Published: 21 July 2018
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8560-9
Online ISBN: 978-1-4939-8561-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics