Phenome-ing Microbes

  • Klaus HornischerEmail author
  • Susanne Häussler
Part of the Springer Protocols Handbooks book series (SPH)


One of the burning questions in bacterial genomics is how the phenotype of a bacterial strain correlates to its genotype. Some phenotypes of a given organism’s isolate arise through simple sequence variations like single nucleotide polymorphisms (SNP) or small insertions/deletions (InDel). For some phenotypes, however, the underlying mechanism cannot be explained by simple genomic differences; rather, most of them are the result of more complex sequence variations. Insight into complex phenotypes such as bacterial pathogenicity, or resistance traits and their molecular background, require comprehensive data obtained in large-scale projects and involve statistical methods. With the increasing usage of next-generation sequencing (NGS) and other “-omics” techniques in molecular biology, projects are now feasible which provide such a data foundation. Big data, however, not only offers new opportunities but also requires extensive data management systems. A coupled system of a relational database, web interface and statistical methods provides substantial support for phenotype-genotype correlation studies aimed to unravel molecular mechanisms underlying complex phenotypes and designed for biomarker identification.


Association study Biomarker identification Genotype-phenotype correlation 


  1. 1.
    Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9:387–402CrossRefPubMedGoogle Scholar
  2. 2.
    Mardis ER (2013) Next-generation sequencing platforms. Annu Rev Anal Chem 6:287–303CrossRefGoogle Scholar
  3. 3.
    Bielecki P et al (2014) In vivo mRNA profiling of uropathogenic Escherichia coli from diverse phylogroups reveals common and group-specific gene expression profiles. mBio. doi: 10.1128/mBio.01075-14 PubMedPubMedCentralGoogle Scholar
  4. 4.
    Pohl S et al (2014) The extensive set of accessory Pseudomonas aeruginosa genomic components. FEMS Microbiol Lett 356:235–241. doi: 10.1111/1574-6968.12445 CrossRefPubMedGoogle Scholar
  5. 5.
    European Commission (2010) Workshop to clarify the scope for stratification biomarkers and to identify bottlenecks in the discovery and the use of such biomarkers. Accessed 19 Mar 2015
  6. 6.
    Wang Z, Gerstein M, Snyder M (2009) RNA-seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Schulz S et al (2015) Elucidation of sigma factor-associated networks in Pseudomonas aeruginosa reveals a modular architecture with limited and function-specific crosstalk. PLoS Pathog 11:e1004744. doi: 10.1371/journal.ppat.1004744 CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Tateno Y et al (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res 30:27–30CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Kulikova T et al (2004) The EMBL nucleotide sequence database. Nucleic Acids Res 32:D27–D30CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Benson DA et al (2014) GenBank. Nucleic Acids Res 42:D32–D37CrossRefPubMedGoogle Scholar
  11. 11.
    Pruitt KD et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42:D756–D763CrossRefPubMedGoogle Scholar
  12. 12.
    Medini D et al (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589–594. doi: 10.1016/j.gde.2005.09.006 CrossRefPubMedGoogle Scholar
  13. 13.
    Vernikos G et al (2015) Ten years of pan-genome analyses. Curr Opin Microbiol 23:148–154. doi: 10.1016/j.mib.2014.11.016 CrossRefPubMedGoogle Scholar
  14. 14.
    Xiao J et al (2015) A brief review of software tools for pangenomics. Genomics Proteomics Bioinformatics 13:73–76. doi: 10.1016/j.gbp.2015.01.007 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Sorek R, Cossart P (2010) Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet 11:9–16CrossRefPubMedGoogle Scholar
  16. 16.
    Head SR et al (2014) Library construction for next-generation sequencing: overviews and challenges. Biotechniques 56:61–77PubMedPubMedCentralGoogle Scholar
  17. 17.
    Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21:936–939. doi: 10.1101/gr.111120.110 CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Langmead B et al (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. doi: 10.1186/gb-2009-10-3-r25 CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107 CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Birol I et al (2009) De novo transcriptome assembly with ABySS. Bioinformatics 25:2872–2877. doi: 10.1093/bioinformatics/btp367 CrossRefPubMedGoogle Scholar
  22. 22.
    Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9:868–877CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    UniProt Consortium (2014) Activities at the universal protein resource (UniProt). Nucleic Acids Res 42:D191–D198. doi: 10.1093/nar/gkt1140 CrossRefGoogle Scholar
  24. 24.
    Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefPubMedGoogle Scholar
  25. 25.
    Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760. doi: 10.1093/bioinformatics/btp324 CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Zerbino DR (2010) Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics. doi: 10.1002/0471250953.bi1105s31 PubMedPubMedCentralGoogle Scholar
  28. 28.
    Zerbino DR et al (2009) Pebble and Rock Band: heuristic resolution of repeats and scaffolding in the Velvet short-read assembler. PLoS One 4:e8407. doi: 10.1371/journal.pone.0008407 CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    De Bruyn A et al (2014) Phylogenetic reconstruction methods: an overview. Methods Mol Biol 1115:257–277. doi: 10.1007/978-1-62703-767-9_13 CrossRefPubMedGoogle Scholar
  30. 30.
    Larkin MA (2007) ClustalW and ClustalX version 2. Bioinformatics 23:2947–2948. doi: 10.1093/bioinformatics/btm404 CrossRefPubMedGoogle Scholar
  31. 31.
    Leekitcharoenphon P et al (2014) Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica. PLoS One 9:e87991. doi: 10.1371/journal.pone.0087991 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Helmholtz Centre for Infection ResearchDepartment of Molecular BacteriologyBraunschweigGermany
  2. 2.TWINCORE, Centre for Experimental and Clinical Infection ResearchInstitute for Molecular BacteriologyHannoverGermany

Personalised recommendations