Skip to main content

Bioinformatics Approaches in Studying Microbial Diversity

  • Chapter
Management of Microbial Resources in the Environment

Abstract

Proper understanding of molecular sequences, identification and phylogenetics of microorganisms are very important in many branches of biological science. Generation of genomic DNA sequence data from different organisms including microbes requires the application of bioinformatics tools for their analysis. Bioinformatics tools including BLAST, multiple sequence alignment tools etc. are used to analyze nucleic acid and amino acid sequences for phylogenetic affiliation. The emerging fields of comparative genomics and phylogenomics require the substantial knowledge and understanding of computational methods to handle the large scale data involved. The introduction of comparative rRNA sequence analysis represents a major milestone in the history of microbiology. Also single gene based phylogenetic inference and alternative global markers including elongation and initiation factors, RNA polymerase subunits, DNA gyrases, heat shock and RecA proteins are of immense importance. The analysis of the sequence data involves four general steps including: (i) selection of a suitable molecule or molecules, (ii) acquisition of molecular sequences, (iii) multiple sequence alignment and (iv) phylogenetic evaluation. This chapter explains in detail how raw data of molecular sequences from any microbe may be used for the detection and identification of microorganisms using computer based bioinformatics tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Al-Khaldi SF, Mossoba MM, Allard MM, Lienau EK, Brown ED (2012) Bacterial identification and subtyping using DNA microarray and DNA sequencing. Methods Mol Biol 881:73–95

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barrell D, Dimmer E, Huntley RP, Binns D, O’Donovan C, Apweiler R (2009) The GOA database in 2009–an integrated gene ontology annotation resource. Nucleic Acids Res 37:D396–D403

    Article  CAS  PubMed  Google Scholar 

  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL (2002) GenBank. Nucleic Acids Res 28:15–18

    Article  Google Scholar 

  • Bier FF, von Nickisch-Rosenegk M, Ehrentreich-Förster E, Reiss E, Henkel J, Strehlow R, Andresen D (2008) DNA microarrays. Adv Biochem Eng Biotechnol 109:433–453

    CAS  PubMed  Google Scholar 

  • Bilitewski U (2009) DNA microarrays: an introduction to the technology. Methods Mol Biol 509:1–14

    Article  CAS  PubMed  Google Scholar 

  • Boerlin P, Reid-Smith RJ (2008) Antimicrobial resistance: its emergence and transmission. Anim Health Res Rev 2:115–126

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Borucki MK, Kim SH, Call DR, Smole SC, Pagotto F (2004) Selective discrimination of Listeria monocytogenes epidemic strains by a mixed-genome DNA microarray compared to discrimination by pulsed-field gel electrophoresis, ribotyping, and multilocus sequence typing. J Clin Microbiol 42:5270–5276

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bray N, Pachter L (2003) MAVID multiple alignment server. Nucleic Acids Res 31:3525–3526

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res 14:693–699

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Brown MP (2000) Small subunit ribosomal RNA modeling using stochastic context free grammar. Proc Int Conf Intell Syst Mol Biol 8:57–66

    CAS  PubMed  Google Scholar 

  • Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S, NISC Comparative Sequencing Program (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13:721–731

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bruno WJ, Socci ND, Halpern AL (2000) Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 17:189–197

    Article  CAS  PubMed  Google Scholar 

  • Call DR, Borucki MK, Besser TE (2003) Mixed-genome microarrays reveal multiple serotype and lineage-specific differences among strains of Listeria monocytogenes. J Clin Microbiol 41:632–639

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM (2005) The ribosomal database project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 33(Database issue)

    Google Scholar 

  • Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145

    Article  CAS  PubMed  Google Scholar 

  • Darling AC, Mau B, Blattner FR, Perna NT (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • DeSantis TZ, Dubosarskiy I, Murray SR, Andersen GL (2003) Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA. Bioinformatics 19:1461–1468

    Article  CAS  PubMed  Google Scholar 

  • Dewey CN (2007) Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol 395:221–236

    Article  CAS  PubMed  Google Scholar 

  • Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2128

    Article  CAS  PubMed  Google Scholar 

  • Feinbaum R (2001) Introduction to plasmid biology. Curr Protoc Mol Biol Chapter 1:Unit 1.5

    Google Scholar 

  • Felsenstein J (1993) PHYLIP (phylogeny inference package) version 3.5c. Department of Genetics, University of Washington, Seattle

    Google Scholar 

  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512

    Article  CAS  PubMed  Google Scholar 

  • Foster JA, Moore JH, Gilbert JA, Bunge J (2012) Microbiome studies: analytical tools and techniques. Pac Symp Biocomput 17:200–202

    Google Scholar 

  • Foxman B, Zhang L, Koopman JS, Manning SD, Marrs CF (2005) Choosing an appropriate bacterial typing technique for epidemiologic studies. Epidemiol Perspect Innov 25:2–10

    Google Scholar 

  • Garrity GM, Winters A, Kuo AW, Searles DB (2002) Taxonomic outline of the prokaryotes. In: Bergey’s manual of systematic bacteriology, 2nd edn. Springer, New York. http://www.springer-ny.com/bergeysoutline

  • Gomez SM, Choi K, Wu Y (2008) Prediction of protein-protein interaction networks. Curr Protoc Bioinform Chapter 8:Unit 8.2

    Google Scholar 

  • Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R, Gene Ontology Consortium (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261

    Article  CAS  PubMed  Google Scholar 

  • Idekar T, Galitski T, Hood L (2001) A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343–372

    Article  Google Scholar 

  • Iwen PC, Hinrichs SH, Rupp ME (2002) Utilization of the internal transcribed spacer regions as molecular targets to detect and identify human fungal pathogens. Med Mycol 40:87–109

    Article  CAS  PubMed  Google Scholar 

  • Johnston JW (2010). Laboratory growth and maintenance of Haemophilus influenzae. Curr Protoc Microbiol Chapter 6:Unit 6D.1

    Google Scholar 

  • Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948

    Article  CAS  PubMed  Google Scholar 

  • Larsen N, Olsen GJ, Maidak BL, McCaughey MJ, Overbeek R, Macke TJ, Marsh TL, Woese CR (1993) The ribosomal database project. Nucleic Acids Res 21:3021–3023

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lee MM, Chan MK, Bundschuh R (2008) Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches. Bioinformatics 24:1339–1343

    Article  CAS  PubMed  Google Scholar 

  • Lin HN, Notredame C, Chang JM, Sung TY, Hsu WL (2011) Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words. PLoS One 6:e27872

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Loy A, Horn M, Wagner M (2003) probeBase: an online resource for rRNA-targeted oligonucleotide probes. Nucleic Acids Res 31:514–516

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Loy A, Maixner F, Wagner M, Horn M (2007) probeBase – an online resource for rRNA-targeted oligonucleotide probes: new features 2007. Nucleic Acids Res 35(Database issue):D800–D804

    Article  CAS  PubMed  Google Scholar 

  • Luscombe NM, Greenbaum D, Gerstein M (2001) What is bioinformatics? A proposed definition and overview of the field. Methods Inform Med 40:346–358

    CAS  Google Scholar 

  • Maidak BL, Cole JR, Lilburn TG, Parker CT, Saxman PR, Farris RJ, Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM (2001) The RDP-II (Ribosomal Database Project). Nucleic Acids Res 29:173–174

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • McHardy AC, Martín HG, Tsirigos A, Hugenholtz P, Rigoutsos I (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63–72

    Article  CAS  PubMed  Google Scholar 

  • Mølbak L, Tett A, Ussery DW, Wall K, Turner S, Bailey M, Field D (2003) The plasmid genome database. Microbiology 149:3043–3045

    Article  PubMed  Google Scholar 

  • Nakashima N, Mitani Y, Tamura T (2005) Actinomycetes as host cells for production of recombinant proteins. Microb Cell Fact 4:7

    Article  PubMed  PubMed Central  Google Scholar 

  • Pearson WR (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 183:63–98

    Article  CAS  PubMed  Google Scholar 

  • Pearson WR (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11:635–650

    Article  CAS  PubMed  Google Scholar 

  • Perna NT, Plunkett G 3rd, Burland V, Mau B, Glasner JD, Rose DJ, Mayhew GF, Evans PS, Gregor J, Kirkpatrick HA, Pósfai G, Hackett J, Klink S, Boutin A, Shao Y, Miller L, Grotbeck EJ, Davis NW, Lim A, Dimalanta ET, Potamousis KD, Apodaca J, Anantharaman TS, Lin J, Yen G, Schwartz DC, Welch RA, Blattner FR (2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529–533

    Article  CAS  PubMed  Google Scholar 

  • Plewniak F (2008) Database similarity searches. Methods Mol Biol 484:361–378

    Article  CAS  PubMed  Google Scholar 

  • Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig WG, Peplies J, Glöckner FO (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–7196

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Relman DA (2002) New technologies, human-microbe interactions, and the search for previously unrecognized pathogens. J Infect Dis 186(Suppl 2):254–258

    Article  Google Scholar 

  • Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29:2994–3005

    Article  PubMed  PubMed Central  Google Scholar 

  • Shah SP, Huang Y, Xu T, Yuen MM, Ling J, Ouellette BF (2005) Atlas – a data warehouse for integrative bioinformatics. BMC Bioinformatics 21:6–34

    Google Scholar 

  • Som A (2006) Theoretical foundation to estimate the relative efficiencies of the Jukes-Cantor  +  gamma model and the Jukes-Cantor model in obtaining the correct phylogenetic tree. Gene 385:103–110

    Article  CAS  PubMed  Google Scholar 

  • Stark M, Berger SA, Stamatakis A, von Mering C (2010) MLTreeMap – accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 5:11–461

    Google Scholar 

  • Summers DK (1996) The biology of plasmids. Blackwell Science, Oxford

    Book  Google Scholar 

  • Takahashi K, Nei M (2000) Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Mol Biol Evol 17:1251–1258

    Article  CAS  PubMed  Google Scholar 

  • Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637

    Article  CAS  PubMed  Google Scholar 

  • Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41

    Article  PubMed  PubMed Central  Google Scholar 

  • Tenover FC, Arbeit RD, Goering RV (1997) How to select and interpret molecular strain typing methods for epidemiological studies of bacterial infections: a review for healthcare epidemiologists. Infect Control Hosp Epidemiol 18:426–439

    Article  CAS  Google Scholar 

  • Thomas CM (2000) The horizontal gene pool: bacterial plasmids and gene spread. Harwood Academic, Amsterdam

    Book  Google Scholar 

  • Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinform. Chapter 2:Unit 2.3

    Google Scholar 

  • Wan Y, Broschat SL, Call DR (2007) Validation of mixed-genome microarrays as a method for genetic discrimination. Appl Environ Microbiol 73:1425–1432

    Article  CAS  PubMed Central  Google Scholar 

  • Wilmes P, Simmons SL, Denef VJ, Banfield JF (2009) The dynamic genetic repertoire of microbial communities. FEMS Microbiol Rev 33:109–132

    Article  CAS  Google Scholar 

  • Woese CR (1987) Bacterial evolution. Microbiol Mol Biol Rev 51:221–271

    CAS  Google Scholar 

  • Yauk CL, Berndt ML (2007) Review of the literature examining the correlation among DNA microarray technologies. Environ Mol Mutagen 48:380–394

    Article  CAS  PubMed Central  Google Scholar 

  • Zhou J (2003) Microarrays for bacterial detection and microbial community analysis. Curr Opin Microbiol 6:288–294

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Tabish .

Editor information

Editors and Affiliations

Glossary

Homology Searches: BLAST & FASTA

Background Information: The three BLAST programs that one will commonly use are BLASTN, BLASTP and BLASTX. BLASTN will compare your DNA sequence with all the DNA sequences in the nonredundant database (nr). BLASTP will compare your protein sequence with all the protein sequences in nr. In BLASTX nucleotide sequence of interest will be translated in all six reading frames and the products compared with the nr protein database. A tutorial is also available at NCBI.

BLAST Homepage – (NCBI)

Found at http://blast.ncbi.nlm.nih.gov/Blast.cgi. It is widely used for homology searches. BLAST stands for Basic Local Alignment Search Tool and it displays a number of organism and query specific blast.

Nucleotide BLAST (BLASTN)

N.B. the default database is “human genomic and transcript” not “nucleotide collection (nt/nr)”

Protein BLAST (BLASTP)

This program is also coupled with a motif search. If you suspect that your protein may only show weak sequence similarity to other proteins, I would suggest clicking on the PSI-BLAST (Position-Specific Iterated BLAST) feature. NCBI also provides a PSI-BLAST tutorial. PSI-BLAST searches to yield better delineation of true and false positives.

Translated BLAST (BLASTX)

TBLASTX searches translated nucleotide databases using a translated nucleotide query; while TBLASTN searches translated nucleotide databases using a protein query. These are useful resources if you are interested in homologs in unfinished genomes. Undeter “Databases” select “genomic survey sequences”, “High throughput genomic sequences” or “whole-genome shotgun reads”

Blast with Microbial Genomes (BLASTN, TBLASTN, TBLASTX etc.).

It permits us to compare a nucleic acid or protein sequence against finished archaeal and bacterial genomes. Depending upon the time of day your results may appear almost immediately or the search may be delayed or not accepted at all. For PSI-BLAST, and other searches it is recommended to frequently enter information in the “Entrez Query” section e.g. Escherichia coli [organism] or Viruses [organism] to see “hits” specifically to E. coli or viruses/bacteriophages. It is advisable to always select “Show results in a new window”

EMB BLAST-(European Molecular Biology network).

Very convenient since it permits one to specifically search databases such as prokaryote, bacteriophage, fungal, & 16S rRNA using BLASTN, and specific bacterial genomes or SwissProt using BLASTX or BLASTN.

ParAlign

It employs a heuristic method for sequence alignment. In essence, ParAlign is about as sensitive as Smith-Waterman but runs at the speed of BLAST.

GTOP

Sequence Homology Search (Laboratory for Gene-Product Informatics, National Institute of Genetics, Japan) – offers BLASTP search capability against individual archaea, bacteria, eukaryota, and viruses.

T4-like Phage NCBI MegaBLAST (Tulane Univ., New Orleans, U.S.A. & CNRS, Toulouse, France)

This includes a growing list of T4-like completed phage sequences as well as those in the draft and contig stages of completion.

WU-BLAST (Washington University BLAST)

The emphasis of this tool is to find regions of sequence similarity quickly, with minimum loss of sensitivity. This will yield functional and evolutionary clues about the structure and function of the novel sequence.

Batch BLAST (Greengene web server; University of Massachusetts, Lowell, U.S.A.)

was developed by Michael V. Graves for DNA or protein BLAST sequence analysis against the NCBI databases. It allows one to submit a file that contains multiple sequences and then will organize the results by each individual sequence contained in the file.

HHPred Homology detection & structure prediction

is a method for database searching and structure prediction that is as easy to use as BLAST but is much more sensitive in finding remote homologs. HHpred is the first server that is based on the pairwise comparison of profile hidden Markov models (HMMs). Whereas most conventional sequence search methods search sequence databases such as UniProt or the NR, HHpred searches alignment databases, like Pfam or SMART. This greatly simplifies the list of hits to a number of sequence families instead of a clutter of single sequences. HHpred accepts a single query sequence or a multiple alignment as input.

PSI-BLAST or PHI-BLAST search

Position-Specific Iterative BLAST creates a profile after the initial search.

BLAST 2

BLAST two sequences against one another. This utilizes BLASTN, P, X as well as TBLASTN and TBLASTX.

Gene Context Tool

It is an incredible tool for visualizing the genome context of a gene or group of genes.

TC-BLAST

It scans the transport protein database (TC-DB) producing alignments and phylogenetic trees. The TC-DB details a comprehensive classification system for membrane transport proteins known as the Transport Commission (TC) system.

MEROPS BLAST

This permits one to screen protein sequences against an extensive database of characterized peptidases.

SEARCHGTr

It is web-based software for the analysis of glycosyltransferases involved in the biosynthesis of a variety of pharmaceutically important compounds like adriamycin, erythromycin, vancomycin etc. This software has been developed based on a comprehensive analysis of sequence/structural features of 102 GTrs of known specificity from 52 natural product biosynthetic gene clusters.

PipeAlign

It offers an integrated approach to protein family analysis through a cascade of different sequence analysis programs (BALLAST, DbClustal multiple alignment program, Rascal alignment analysis) removal of any sequences that do not belong to the protein family are performed by the NorMD, and clustered into potential functional subfamilies using Secator or DPC.

MPsrch (EMBL-EBI)

This sequence sequence comparison tool implements the true Smith and Waterman algorithm identifying hits in cases where Blast and Fasta fail and also reports fewer false-positives. This software provides information on Match %; % Query Match (% of the query sequence matched); Conservative changes; Mismatches; Indels; and Gaps.

GOAnno

This web tool automatically annotates proteins according to the Gene Ontology using hierarchised multiple alignments. Positioning the query protein in its aligned functional subfamily represents a key step to obtain highly reliable predicted GO annotation based on the GOAnno algorithm.

COMPASS

This is a profile-based method for the detection of remote sequence similarity and the prediction of protein structure. The server features three major developments: (i) improved statistical accuracy; (ii) increased speed from parallel implementation; and (iii) new functional features facilitating structure prediction. These features include visualization tools that allow the user to quickly and effectively analyze specific local structural region predictions suggested by COMPASS alignments.

MineBlast

It performs BLASTP searches in UniProt to identify names and synonyms based on homologous proteins and subsequently queries PubMed, using combined search terms in order to find and present relevant literature.

Comparison of homology between two small genomes:

SCAN2 (Softberry.com) It provides one with a colour-coded graphical alignment of genome length DNAs in Java. In the top panel regions of high sequence identity are presented in red. By highlighting the grey yellow, green, black boxes one can select specific regions for examination of the sequence alignment.

Advanced PipMaker

It aligns two DNA sequences and returns a percent identity plot of that alignment, together with a traditional textual form of the alignment. We may need to download it for viewing and manipulating the output from pairwise alignment programs such as PipMaker representations of the alignments.

JDotter: A Java Dot Plot Viewer

(Viral Bioinformatics Resource Center, University of Victoria, Canada) – a dot matrix plotter for Java. It produces similar diagrams to the above mentioned programs, but with better control on output.

multi-zPicture: multiple sequence alignment tool

provides nice dotplot graphs and dynamic visualizations. If simple gene locations are provided in the form (e.g. >2,000–5,000 RNA_polymerase; indicates that the RNA polymerase gene is found on the plus strand between bases 2,000 and 5,000) this data will be added to the dynamic visualization. zPicture alignments can be automatically submitted to rVista to identify conserved transcription factor binding sites.

GeneOrder 3.0

This is ideal for comparing small GenBank genomes (up to 2 Mb). Each gene from the Query sequence is compared to all of the genes from the Reference sequence using BLASTP. There are two display formats: graphical and tabular.

CoreGenes

This programme is designed to analyze two to five genomes simultaneously, generating a table of related genes – orthologs and putative orthologs. These entries are linked to their GenBank data. It has a limit of 0.35 Mb, while the newer version CoreGenes 2.0 extends the limit to approx. 2.0 Mb. If data is not present in GenBank, using this site will be very helpful.

CoreGenes 3.0

This is the latest member in the CoreGenes family of tools. It determines unique genes contained in a pair of proteomes. This has proved extremely useful in determining unique genes in comparisons between large Myoviridae.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Tabish, M., Azim, S., Hussain, M.A., Rehman, S.U., Sarwar, T., Ishqi, H.M. (2013). Bioinformatics Approaches in Studying Microbial Diversity. In: Malik, A., Grohmann, E., Alves, M. (eds) Management of Microbial Resources in the Environment. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5931-2_6

Download citation

Publish with us

Policies and ethics