Skip to main content

Identifying Bacterial Strains from Sequencing Data

  • Protocol
  • First Online:
Data Mining for Systems Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1807))

Abstract

Environmental and clinical settings can host a wide variety of both bacterial species and strains in a single colony but accurate identification of the organisms is difficult. We describe BIB, a probabilistic method for estimating the relative abundances of species or strains contained in mixed samples analyzed by short read high-throughput sequencing. By grouping closely related strains together in clusters, the BIB pipeline is capable of estimating the relative abundances of the clusters contained in a sequencing sample.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Balmer O, Tanner M (2011) Prevalence and implications of multiple-strain infections. Lancet Infect Dis 11:868–878

    Google Scholar 

  2. Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612

    Google Scholar 

  3. Brito IL, Alm EJ (2016) Tracking strains in the microbiome: insights from metagenomics and models. Front Microbiol 7:712

    Google Scholar 

  4. Breitwieser FP, Lu J, Salzberg SL (2017) A review of methods and databases for metagenomic classification and assembly. Brief Bioinf https://doi.org/10.1093/bib/bbx120

  5. Sankar A, Malone B, Bayliss SC, Pascoe B, MĂ©ric G, Hitchings MD et al (2016) Bayesian identification of bacterial strains from sequencing data. Microb Genomics 2:e000075

    Article  Google Scholar 

  6. Glaus P, Honkela A, Rattray M (2012) Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28:1721–1728

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Hensman J, Papastamoulis P, Glaus P, Honkela A, Rattray M (2015) Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics 31:3881–3889

    PubMed  PubMed Central  CAS  Google Scholar 

  8. Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M (2015) Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol 16:150

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Xing Y, Yu T, Wu YN, Roy M, Kim J, Lee C (2006) An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34:3150–3160

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Jiang H, Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–1032

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500

    Article  CAS  PubMed  Google Scholar 

  12. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R et al (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95:3140–3145

    Article  CAS  PubMed  Google Scholar 

  13. Cheng L, Connor TR, Sirén J, Aanensen DM, Corander J (2013) Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Mol Biol Evol 30:1224–1228

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Corander J, Sirén J, Arjas E (2008) Bayesian spatial modeling of genetic population structure. Comput Stat 23:111

    Article  Google Scholar 

  15. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the Academy of Finland [259440 to A.H., 251170 to J.C.].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antti Honkela .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Mäklin, T., Corander, J., Honkela, A. (2018). Identifying Bacterial Strains from Sequencing Data. In: Mamitsuka, H. (eds) Data Mining for Systems Biology. Methods in Molecular Biology, vol 1807. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8561-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8561-6_1

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8560-9

  • Online ISBN: 978-1-4939-8561-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics