Base-By-Base version 2: single nucleotide-level analysis of whole viral genome alignments
- 7.1k Downloads
Base-By-Base is a Java-based multiple sequence alignment editor. It is capable of working with protein and DNA molecules, but many of its unique features relate to the manipulation of the genomes of large DNA viruses such as poxviruses, herpesviruses, baculoviruses and asfarviruses (1-400 kb). The tool was built to serve as a platform for comparative genomics at the level of individual nucleotides.
In version 2, BBB-v2, of Base-By-Base we have added a series of new features aimed at providing the bench virologist with a better platform to view, annotate and analyze these complex genomes. Although a poxvirus genome, for example, may be less than 200 kb, it probably encodes close to 200 proteins using multiple classes of promoters with frequent overlapping of promoters and coding sequences and even some overlapping of genes. The new features allow users to 1) add primer annotations or other data sets in batch mode, 2) export differences between sequences to other genome browsers, 3) compare multiple genomes at a single nucleotide level of detail, 4) create new alignments from subsets/subsequences of a very large master alignment and 5) allow display of summaries of deep RNA sequencing data sets on a genome sequence.
BBB-v2 significantly improves the ability of virologists to work with genome sequences and provides a platform with which they can use a multiple sequence alignment as the basis for their own editable documents. Also, a .bbb document, with a variety of annotations in addition to the basic coding regions, can be shared among collaborators or made available to an entire research community. The program is available via Virology.ca using Java Web Start and is platform independent; the Java 1.5 virtual machine is required.
KeywordsMultiple Sequence Alignment Viral Genome Multiple Sequence Alignment Tool Bench Scientist Poxvirus Genome
The original version of Base-By-Base (BBB)  was developed by the Virus Bioinformatics Resource Center primarily because of a need for a customizable, platform-independent (Java), multiple sequence alignment (MSA) editor that could be used for the comparison, annotation and analysis of viral genomes, and also integrated directly with our MySQL database. There are now several somewhat related MSA tools; however, each has their own set of distinctive features that make them especially valuable in their niche environments. For example, JalView  and STRAP  focus on features for display and analysis of proteins, whereas SeaView  adds phylogenetic analyses to a nucleic acid alignment editor. Key distinctive features of the original BBB were its ability to read annotations (CDS positions) from GenBank files, an explicit display of differences between sequences and a tool to compare genomes, including the ability to summarize the consequences of all nucleotide changes on encoded proteins throughout a genome. These basic features have been retained and frequently improved upon in Base-By-Base version 2 (BBB-v2), and a series of new features have been added.
Throughout the development of BBB-v2, our focus has been on the user experience. We have endeavoured to make the tool intuitive and to add features that are useful to molecular biologists working daily with large viral genomes (up to 300 kb) that contain approximately 1 gene/kb of sequence. To this end, we also link BBB-v2 to our graphical database interface so that it can be used to select genomes, genes and proteins, which are then opened in BBB; this provides much easier access to data.
Results and Discussion
Many of the features of version 1 of BBB were intended to optimize the visualization of MSAs; for example, using a novel display of differences between sequences, collection and summary of all differences between pairs of sequences, providing a summary view on complete MSAs and the display of the results of motif searches through the sequences. In BBB-v2, the addition of a variety of new features creates a more interactive workspace/notebook for the bench scientist.
User annotation of sequences
Finally, we became aware that it would be very useful to attach a variety of information to particular MSAs that were being used for multiple analyses. For example, the name of the program used for alignment with any deviations from default settings, manual edits made to the MSA and results of previous searches and analyses. To this end, we added a new module to BBB-v2 to allow textual information (up to 500 lines) to be incorporated into the .bbb file. This is edited and saved together with the MSA as a single file, so there is no extra work involved or chance that important notes will be lost (see menu: Edit/Edit MSA Notes).
Manipulation of sequences within a MSA
After aligning and viewing sequences/genomes retrieved from web-based resources, the researcher may still face a number of problems. These large sets of sequences often contain multiple copies of identical sequences from different virus isolates. BBB-v2 offers the ability to remove these extra sequences (see menu: Edit/Remove identical sequence(s)). Likewise, sequences from these resources often have complex names with Accession, Locus and Version information that obscures viewing of more human-intelligible names. To simplify the process of changing names, BBB-v2 has an easy to use tool to edit the sequence names (see menu: Edit/Edit Sequence Names).
Furthermore, the viewing and maneuvering through very large MSAs is often slow and usually tedious. One possibility to deal with this problem may be to save a copy of the alignment and remove some sequences (see menu: File/Remove Sequences) from the MSA, however, this doesn't overcome problems with long genomic sequences. Therefore in BBB-v2 we have incorporated several simple methods to create new alignments from subsets of the sequences and also to use sub-sequences. After selecting the required subset of sequences in the MSA, the researcher can also select a region of the sequences before 1) Saving to a .bbb file (see menu: File/Save Selected Regions to BBB file), 2) opening in a new alignment window (see menu: File/Open Selected Regions to new BBB file), or 3) Sending to a new text file (see menu: File/Export Selected Regions to fasta). These features allow users to maintain a master MSA for reference purposes, but to work on different subsets of the data when expeditious; user annotations are also exported with the subsequences.
Display of sequence variability within a MSA
If the length of the sequences and/or the number of differences between them is large, the Visual Summary display may be inadequate for resolving single events; therefore the differences can be exported from a BBB-v2 alignment to a text file suitable to importing into our genome browser, the Viral Genome Organizer  (see menu: Tools/Export Differences to VGO). A user may require this feature to get an overall picture of the distribution of differences along two genomes being compared.
Alternatively, a new feature that displays identity plots for DNA sequences and similarity plots for protein sequences (PAM250 matrix) can be used to show the relationship between multiple sequences in an alignment (see menu: Reports/Sequence Similarity Graph). This feature was inspired by SimPlot  and allows the user to change the size of the Sliding Window and the Step Size as well as to choose which sequence from the MSA should be used as the reference (Figure 3b). A user can also Zoom into a region of the plot.
One of the unique features of BBB was the ability to use annotations from a sequence to identify differences in genes within 2 genomes being compared; this was intended for assessing variation between very similar genomes. BBB-v2 now can summarize the differences among multiple genomes and was used in a comparison of varicella zoster genomes from vaccinated individuals [7, 8].
Display of deep RNA sequencing data
A significant number of new features have been added to the original BBB alignment editor. Although at the heart of this program is the ability to display and edit MSAs, it is unique in that many of its functions focus on the analysis and annotation of viral genomes, sequences in the 10-300 kb range. BBB-v2 also aims to present the user, a bench scientist, with a MSA that can be further analyzed/searched and annotated by the user. In BBB-v2, we have added 2 new categories of annotation in addition to the sequence comments that could be attached to these regions of sequences in BBB. The ability to save genome-linked primer sequences with a variety of annotations, together with an editable text page allows the research scientist to use a particular MSA .bbb document as an extension of their lab notebook.
The addition of a module to view MochiView summary data provides virologists a familiar intuitive way to examine RNA sequencing data that is critical for development of new, more accurate transcription maps of viral genomes.
Essentially, BBB-v2 builds directly upon the original BBB Java code, which was chosen to enable support of Mac OS X, MS Windows and LINUX computer platforms. A user opens the application (client) using Java Web Start from a Virology.ca web page. This approach, which automatically downloads the application from the host server computer whenever a new version is available, greatly simplifies the distribution of updates and ensures users are taking advantage of the latest version of the software.
BBB was developed to be used as an editor for MSAs generated from the Virus Orthologous Clusters (VOCs)  database at Virology.ca , with sequences being selected in the VOCs GUI, sent to an alignment program (ClustalW , T-Coffee  or MUSCLE ) and finally displayed in BBB. However, the program can also save/load alignments and sequences from the user's local computer; these can be in FASTA, GenBank (read only) or the native file format .bbb, which is based on the XML Bioinformatics Sequence Markup Language (BSM) standard. The BBB file format is required if the user needs to store information other that the sequence alignment, such as CDS features, DNA primers and user-added annotations for the sequences. The XML format of BBB files also simplifies the sharing of this information between tools.
Relatively small alignments can be performed within BBB, but because users may have favourite tools or specific needs in the alignment of large DNA sequences it is expected that large alignments from tools such as DIALIGN  or MAFFT  will be loaded as FASTA alignments (output files of these programs). Importantly, BBB-v2 permits the user to add genome annotations, for example from a GenBank file, back to a gapped sequence within a MSA after alignment of FASTA formatted sequences in another program.
Availability and requirements
Project name: Base-By-Base
Project home page: http://athena.bioc.uvic.ca/tools/BaseByBase
Operating system(s): Platform independent
Programming language: Java
Other requirements: Java 1.5 or higher, this requires at least system 10.5 on Apple OS X.
License: GNU General Public License
Any restrictions to use by non-academics: Contact authors
This work was funded by a Canadian NSERC Discovery grant. We would like to thank all the University of Victoria Co-op students that have contributed to the Virology.ca Bioinformatics Resource. We are grateful to Dr. B. Moss for the MochiView summary of VACV RNA sequencing data.
- 6.Lole KS, Bollinger RC, Paranjape RS, Gadkari D, Kulkarni SS, Novak NG, Ingersoll R, Sheppard HW, Ray SC: Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 1999, 73: 152-160.PubMedCentralPubMedGoogle Scholar
- 7.Peters GA, Tyler SD, Grose C, Severini A, Gray MJ, Upton C, Tipples GA: A full-genome phylogenetic analysis of varicella-zoster virus reveals a novel origin of replication-based genotyping scheme and evidence of recombination between major circulating clades. J. Virol. 2006, 80: 9850-9860. 10.1128/JVI.00715-06.PubMedCentralCrossRefPubMedGoogle Scholar
- 8.Tyler SD, Peters GA, Grose C, Severini A, Gray MJ, Upton C, Tipples GA: Genomic cartography of varicella-zoster virus: a complete genome-based analysis of strain variability with implications for attenuation and phenotypic differences. Virology. 2007, 359: 447-458. 10.1016/j.virol.2006.09.037.CrossRefPubMedGoogle Scholar
- 13.Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. 2002, Chapter 2:Unit 2.3Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.