Comparative genomics of eight Lactobacillus buchneri strains isolated from food spoilage
Lactobacillus buchneri is a lactic acid bacterium frequently associated with food bioprocessing and fermentation and has been found to be either beneficial or detrimental to industrial food processes depending on the application. The ability to metabolize lactic acid into acetic acid and 1,2-propandiol makes L. buchneri invaluable to the ensiling process, however, this metabolic activity leads to spoilage in other applications, and is especially damaging to the cucumber fermentation industry. This study aims to augment our genomic understanding of L. buchneri in order to make better use of the species in a wide range of applicable industrial settings.
Whole-genome sequencing (WGS) was performed on seven phenotypically diverse strains isolated from spoiled, fermented cucumber and the ATCC type strain for L. buchneri, ATCC 4005. Here, we present our findings from the comparison of eight newly-sequenced and assembled genomes against two publicly available closed reference genomes, L. buchneri CD034 and NRRL B-30929. Overall, we see ~ 50% of all coding sequences are conserved across these ten strains. When these coding sequences are clustered by functional description, the strains appear to be enriched in mobile genetic elements, namely transposons. All isolates harbor at least one CRISPR-Cas system, and many contain putative prophage regions, some of which are targeted by the host’s own DNA-encoded spacer sequences.
Our findings provide new insights into the genomics of L. buchneri through whole genome sequencing and subsequent characterization of genomic features, building a platform for future studies and identifying elements for potential strain manipulation or engineering.
KeywordsLactobacillus buchneri Comparative genomics Lactic acid bacteria CRISPR-Cas systems Fermentation Spoilage Food microbiology
BLAST Ring Image Generator
Clusters of Orthologous Groups
clustered regularly interspaced short palindromic repeat
horizontal gene transfer
open reading frame
Rapid Annotation using Subsystem Technology
Lactobacillus buchneri is a lactic acid bacterium naturally found in varying ecological niches and is typically associated with food production and fermentation processes [1, 2]. This species has been isolated from a variety of environments, including fermented cucumber spoilage [3, 4], grass silage , a bioethanol production plant [6, 7], the human intestine and oral cavity [8, 9], cheese [10, 11], and in beer wort [12, 13]. It is a gram-positive, facultative anaerobe, and obligate heterofermenter producing lactic acid, acetic acid, ethanol, and carbon dioxide . L. buchneri strains are morphologically and metabolically diverse, displaying an array of different colony phenotypes and can metabolize a wide range of carbohydrates . Previous genomic characterization of L. buchneri CD034 revealed the presence of enzymes required to convert lactic acid to acetic acid and CO2 in the presence of oxygen, or 1,2-propanediol anaerobically, a unique metabolic feature protecting against acidification of the cytoplasm in the presence of large amounts of lactate . This ability to convert lactic acid to acetic acid under both aerobic and anaerobic conditions makes L. buchneri useful in the aerobic stabilization of silage, effectively inhibiting spoilage organisms [15, 16]. While this feature is useful in certain bioprocessing environments, it can be detrimental to the cucumber fermentation process. L. buchneri’s metabolism of lactate leads to a rise in pH, enabling metabolic activity of less acid-resistant microbes, ultimately leading to the production of undesirable compounds that spoil the fermentation [4, 17].
It has been previously reported that lactic acid bacteria are highly adapted to specific ecological niches, and have small genomes compared to other bacteria as a consequence of a process called genome reduction, resulting in the maintenance of a minimal number of essential genes required for niche-specific survival . Although the genome of L. buchneri is relatively small, it must retain the ability to quickly and continually evolve with its requisite environment, presumptively through horizontal gene transfer (HGT) of conjugative or mobilizable plasmids and transduction through bacteriophage (phage) infection . Additionally, to survive and successfully propagate in a changing and highly-specific environment, the organism must balance the maintenance of robust defense systems against predatory phage and invasive plasmids with the genomic diversity created through the uptake of exogenous plasmids and other transmissible DNA elements. Alternatively, intra-species diversity can be generated through genomic duplication events propagated by the DNA-copying action of transposases [19, 20].
Although L. buchneri is reportedly diverse in isolation source, phenotype, and metabolic characterization, a mere 14 publicly-available draft genomes exist to date, only two of which are closed: NRRL B-30929 (NC_015428.1), isolated from an ethanol production plant , and CD034 (NC_018610.1), isolated from stable grass silage . To elucidate genomic features, including the genetic flexibility of L. buchneri, we sequenced and assembled draft genomes of eight phenotypically distinct strains previously identified by Daughtry et al.  isolated from spoiled, fermented cucumber brine (LA1175D, LA1181, LA1184, LA1147), anaerobic reproduction of cucumber spoilage (LA1161B, LA1161C, LA1167), and tomato pulp (ATCC 4005). To generate an overview of the strains’ genomic similarity, we aligned the newly-assembled draft genomes with the two publicly-available closed reference genomes, NRRL B-30929 and CD034. Core- and pan-genomes across the eight isolates and two reference genomes were then determined, showing a marked level of genomic conservation. Annotated genes were each assigned a Clusters of Orthologous Groups (COG) designation for high-level functional assignment, indicating a significant number of non-conserved transposons and transposon-related sequences across the pan-genome.
Clustered regularly interspaced short palindromic repeat (CRISPR) and associated genes (cas) systems constitute the prokaryotic adaptive immune system and provide defense against phage and invasive plasmids through targeted nucleolytic cleavage [22, 23, 24, 25, 26, 27, 28, 29, 30]. CRISPR-Cas systems copy a short segment of DNA from the invading nucleic acid sequence and integrate it into the CRISPR locus as a template to prevent future attacks, called a spacer. This locus effectively serves as a “vaccination” record, storing infection events (spacers) chronologically [27, 31, 32]. Detailing and comparing these loci across strains provides insight into the ecological interplay between the isolates and invasive genetic elements, and can be used as a mechanism of strain genotyping [33, 34, 35]. CRISPR loci for the eight isolates and two reference strains were detected and repeats and spacers were identified and subsequently used to search for their genomic sequence of origin, called the protospacer. We show a surprising number of spacers target non-CRISPR regions of lactobacilli in areas containing putative prophage-related genes, as well as invasive plasmids.
Despite the wide range of phenotypes observed across these strains, we found that they share significant identity in terms of protein coding potential, as well as a high degree of similarity across their CRISPR-Cas systems, revealing identical repeat sequences and unique genotypic signatures constructed through the presence of shared ancestral spacers.
Whole-genome assembly statistics for each of the eight sequenced Lactobacillus buchneri isolates
Genome Size (bp)
Max Contig Size (bp)
Reduced NaCl fermented cucumber spoilage
Anaerobic reproduction of commercial fermented cucumber spoilage
Anaerobic reproduction of commercial fermented cucumber spoilage
Anaerobic reproduction of commercial fermented cucumber spoilage
Reduced NaCl fermented cucumber spoilage
Reduced NaCl fermented cucumber spoilage
Reduced NaCl fermented cucumber spoilage
Illumina HiSeq + PacBio
Spacers from all CRISPR loci were extracted and aligned, positioning ancestral spacers on the right and more recent acquisition events on the left (Fig. 4B). With the exception of CD034, we see 100% identity across at least the first and second ancestral spacers from each strain’s type II-A CRISPR 1: a powerful confirmation of evolutionary homology . Within this alignment, two groups with identical spacer sequences were easily identified. The first group contains LA1147 and LA1175D while the second contains LA1161B, LA1161C, and LA1167, consistent with the predicted core-genome based clades from the previous phylogenetic tree (Fig. 2B). While CD034 does have a type II-A locus, none of the identified spacers share significant identity with any type II-A spacer sequences from the other isolates.
The protospacers identified within four L. buchneri genomes were further explored. The three identified protospacers in the genome of Lactobacillus parabuchneri FAM21731 are clustered within a ~ 23 kb putative prophage region (Fig. 5C). LA1184 spacer 13 targets an uncharacterized conserved protein with a phage Mu gpF-like domain while LA1184 spacer 6 targets a phage baseplate J/gp47 family protein. The remaining spacer, LA1184 spacer 15, targets a hypothetical protein. Curiously, the two spacers found to match sequences in the LA1184 genome are self-targeting: they are encoded by LA1184’s own CRISPR 1 locus. Again, we see a protospacer match for LA1184 spacer 13, this time targeting a phage minor head protein with 95.92% similarity to the phage Mu gpF-like protein also targeted in L. parabuchneri FAM21731. The second self-targeting spacer, LA1184 spacer 6, targets a phage baseplate J/gp47 family protein in LA1184, identical to the targeted protein in L. parabuchneri FAM21731. Of the two protospacers found in the genome of NRRL B-30929, one is self-targeting: encoded by NRRL B-30929 spacer 9, and one is encoded by LA1184 spacer 15. NRRL B-30929 spacer 9 and LA1184 spacer 15 each target uncharacterized proteins within a ~ 8 kb region encoding several phage-related genes. The protospacers found in the genome of CD034 are matched by LA1181 spacer 25, which targets a phage tape measure protein, and LA1167 spacer 9 which targets a hypothetical protein ~ 20 kb upstream.
Due to the proposed lethal nature of self-targeting spacers, protospacer/spacer homology and associated PAM sequences for self-targeting spacers were investigated. LA1184 spacer 6 shows 100% identity with the matching protospacer sequence, but a single nucleotide polymorphism (SNP) exists in the PAM: AGAA. The protospacer matching LA1184 spacer 13 has the proper PAM (AAAA) but contains three consecutive SNPs on the 3′ end of the protospacer sequence in what is called the seed sequence . Regarding NRRL B-30929 spacer 9, there are two SNPs in the middle of the protospacer sequence, as well as a single SNP in the PAM: ATAA.
Additional plasmid-based protospacer hits were also identified. LA1181 spacer 20 was found to target a type IV secretion system protein, TraC, in Lactobacillus brevis CD0817 plasmid pCD0817–1. LA1181 spacer 24 matches a sequence within a plasmid recombination enzyme mob 141 in Lactobacillus plantarum plasmid p141.
Given the expanse of colony morphologies and metabolic capabilities displayed by L. buchneri, as well as its prevalence in the food industry, there is a relatively low number of publicly available genome sequences. We sequenced and assembled draft genomes for eight phenotypically diverse strains of L. buchneri, a significant addition to the number of genomes available in the NCBI Genbank . The range of assembled genome sizes, from 2.49 Mb to 2.76 Mb, are typical of the 1.8 to 3.3 Mb range reportedly found in lactic acid bacteria . Hybrid genome assembly of LA1184 revealed 2 detectable plasmid sequences, consistent with multiple plasmids found in both reference strains NRRL B-30929 and CD034. Lactic acid bacteria are known to be highly specialized to their ecological niche, a hypothesis further supported by the presence of accessory plasmids that could quickly be acquired and transferred during times of rapid environmental change. We compared our draft genomes to the complete reference genomes of NRRL B-30929 and CD034 and note that in general, the eight strains share a higher percent identity with NRRL B-30929 than with CD034. Besides identifying two putative prophages, this comparison highlighted two genomic islands based on their divergent base compositions, a hallmark of HGT.
With only half of identified coding sequences conserved across all ten strains, there is a considerable level of genomic diversity represented in the accessory-genome. When each coding sequence is assigned to a COG, we see a large number of pan-genome sequences assigned to the ‘mobilome: prophages, transposons’ category versus the number in the core-genome, with more than half identified as transposons. This abundance and maintenance of diverse mobile elements suggest that transposons are likely an important genetic feature of L. buchneri. Interestingly, we see that ~ 9% of the core-genome is comprised of sequences with a currently unknown function, providing candidates for future functional studies. Additionally, we see that L. buchneri possesses a small set of putative secondary metabolite sequences. As a novel bacteriocin has been previously derived from L. buchneri , the potential for novel antimicrobials and other secondary metabolites should continue to be explored, given the variety of ecological niches occupied by this species and large number of coding sequences with no currently known function.
In all strains analyzed, a type II-A CRISPR-Cas system was found. In LA1167, we see a secondary type II-A locus (CRISPR 2) with only two spacers and a full set of cas genes. The presence of transposases on the 5′ and 3′ ends of this locus could signify a past duplication event in which the intact CRISPR 1 locus was copied locally. The small number of unique spacers and conserved repeat sequences in CRISPR 2, along with the differences in amino acid identity between the cas sequences of the two loci seem to indicate this locus remained functional post-duplication and has begun diverging from CRISPR 1 over evolutionary time. This view into a specific genetic duplication event can facilitate our understanding of transposon-mediated adaptation and the role it could play in bacteriophage and invasive plasmid defense as well as rapid niche-specific evolution. Across our eight strains and reference strain NRRL B-30929, we note the presence of at least two broadly shared ancestral spacers, evidence of a shared common ancestry and subsequent evolutionary divergence. The differences in spacer content displayed by CD034 and presence of three additional type I-E loci are additional signals of a more distant relationship to the eight strains than that of NRRL B-30929. Notably, the observation of spacers targeting similar or identical proteins in putative prophage regions across various L. buchneri genomes suggests multi-strain predation by a single phage or by several closely-related phages. Three spacers targeting the host’s own genome were identified. Assuming a fully functional CRISPR-Cas system, the expression of self-targeting spacers should result in host eradication, however, a variety of escape mechanisms have been observed [48, 49]. Two such escape mechanisms include mutating of the protospacer sequence or mutation of the PAM, which can lower binding efficiency of the CRISPR-Cas machinery to the target, reducing or inhibiting nucleic acid cleavage . In the self-targeting spacers described here, we saw SNPs in the PAM sequence, SNPs in the protospacer, or both. SNPs in the PAM could prevent the initial binding of the Cas9 effector complex, while mismatches between the protospacer and RNA guide could prevent the conformational change required by the CRISPR-Cas effector complex for target cleavage . Although individual phages may escape CRISPR-Cas targeting through random mutation, ultimately, they cannot avoid the random acquisition of spacers accrued by all cells of a bacterial population [51, 52]. An alternative phage defense mechanism against CRISPR-Cas targeting has been observed: the expression of CRISPR-Cas inhibitory proteins: anti-CRISPRs. These small phage-encoded proteins have been implicated in many cases in which self-targeting spacers have been observed, effectively inhibiting death of the host cell by prohibiting DNA cleavage by the CRISPR-Cas effector complex [53, 54, 55]. Anti-CRISPR proteins could play a role in the interaction between the host prophage regions and CRISPR-Cas machinery and is an attractive area for further study.
Indeed, food spoilage due to L. buchneri contamination and in some cases, potential pathogenicity, is of major concern to the food industry [3, 11]. The identity and activity of these prophages coupled with the knowledge of L. buchneri’s CRISPR-Cas systems could be exploited in the food biotechnology sector to treat and reduce contamination and spoilage by this organism, potentially preventing industrial loss and promoting robust bioprocessing. The CRISPR-Cas systems described here should undergo functional testing to explore their utility in engineering endogenous phage resistance into strains to protect starter cultures and promote the ensiling process. Additionally, in vivo characterization of detected prophages should be undertaken in an effort to develop novel tools for modulating this bacterial population and preventing food spoilage due to L. buchneri contamination in relevant industrial environments.
The aim of this study was to increase the available body of knowledge on Lactobacillus buchneri, a microbe shown to have both beneficial and adverse effects in various industrial food settings. Whole-genome sequencing of 7 phenotypically diverse strains found in spoiled, fermented cucumber in concert with the ATCC type strain, ATCC 4005, were deposited in the NCBI Genbank, significantly increasing the number of publicly available L. buchneri strains. Further analyses revealed that these strains are highly enriched in mobile genetic elements, specifically transposons. A single type II-A CRISPR-Cas system was found in each of these strains, with the exception of LA1167, which was found to contain a second type II-A CRIPSR-Cas locus just downstream of the primary locus. Additionally, LA1184 and the reference strain NRRL B-30929 were found to encode CRISPR spacer sequences that target putative integrated prophage regions within their own genome and should be studied in further detail. By expanding the genomic sequences and characterization of this species we hope to set the stage for future studies and to provide actionable data for improved industrial bioprocessing.
Cultures were obtained from the USDA-ARS Food Science Research Unit Culture Collection (Raleigh, NC). Genomic DNA was isolated using a kit (Qiagen DNeasy Blood and Tissue Kit) following the pretreatment protocol for Gram-positive bacteria. Resulting samples were sent to the High-Throughput Sequencing and Genotyping Unit of the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign for library preparation and sequencing. Sequencing was carried out on a HiSeq2500 using a TruSeq Rapid SBS sequencing kit with read lengths of 160 nucleotides. PacBio sequencing for LA1184 was performed by RTL Genomics (Texas, US). DNA from L. buchneri isolates was extracted for PacBio sequencing using Qiagen’s MagAttract HMW DNA Kit. Modifications to the protocol are as follows: 1) After the addition of P1 buffer and lysozyme, samples were incubated using a thermomixer at 37 °C at 900 rpm for ~ 1.5 h. 2) 100 μl AE was used to elute samples. Quality checking was performed using a dsDNA Broad Range DNA kit on a Qubit Fluorometer 3.0, as well as Fragment Analyzer by Advanced Analytical Technologies using the High Sensitivity Large Fragment 50 KB Analysis kit. Samples were subsequently processed through SMRTbell Library preparation using the following protocol: Preparing SMRTbell Libraries using PacBio Barcoded Adapters for Multiplex SMRT sequencing. The protocol was modified with the following: 1) Use equimolar pooled samples. 2) ~ 500 ng additional DNA was added per sample. 3) Use overnight ligation. 4) Use 12 μl of EB for final elution. Library quality was checked using dsDNA High Sensitivity DNA kit Qubit Fluorometer 3.0 and Fragment Analyzer using High Sensitivity large Fragment 50 KB Analysis kit. Library preparation for sequencing was completed using PacBio’s run protocol for diffusion loading, with the addition of a pre-extension time of 120 min and a final loading of 6 pM.
Adapter trimming, quality trimming, and filtering of raw fastq reads was performed in Geneious . Reads were assembled using Geneious’ custom assembler with the following options: no trimming, save contigs > 1000 bp, don’t merge contigs when there is a variant with coverage over 6, allow gaps, ignore words repeated more than 200 times, use minimum overlap identity, use paired reads to improve assembly, and only use paired hits during assembly. Approximately 2.5% (~ 850,000 reads) of available short reads were extracted and used for assembly for all genomes except LA1175D (10% available reads used) and LA1184 (hybrid assembly). Using 2.5% of reads was determined using Assembly Likelihood Estimator , which indicated an undesired high number of short contigs as the percentage of reads used increased. The hybrid assembly of LA1184 used 2.5% of available HiSeq short reads, as well as PacBio long reads. Hybrid assembly with both short and long reads was performed using Unicycler  with the standard hybrid run options in normal mode, including SPAdes error correction . To create scaffolds, contigs were aligned with both closed reference genomes, NRRL B-30929 and CD034, using Contiguator , which generated a single scaffold by concatenating contigs that aligned with the reference genomes. Gaps were inserted at concatenation sites with a series of 100 N’s. Unaligned contigs were added as additional sequence entries to the fasta file of each genome. Genomes were annotated using Rapid Annotation using Subsystem Technology (RAST) .
All vs NRRL B-30929 genome alignments were visualized using the BLAST Ring Image Generator (BRIG) , including a ring for each genome, as well as a ring for GC Content and GC Skew. A BLAST type of BLASTn  was used with the following options: upper identity threshold of 90% and a lower identity threshold of 70%, with a ring size of 30. Genetic features with low BLAST identity were identified through visual genomic inspection. The core- and pan-genomes were determined by first generating annotations using Prokka  with standard options that were then fed into Roary , using the flags -env and the standard threshold of 95% BLASTp  identity. The unique vs new genes graph was generated using the create_pan_genome_plots. R script distributed in the Roary package at https://github.com/sanger-pathogens/Roary/blob/master/bin/create_pan_genome_plots.R. The phylogenetic tree and gene presence/absence coverage were produced using the roary_plots.py script, also available in the public Roary distribution https://github.com/sanger-pathogens/Roary/tree/master/contrib/roary_plots. Core- and pan- genes were assigned a functional COG using PSI-BLAST  with the following flags: -show_gis -outfmt 7 -num_descriptions 1000 -num_alignments 1000 -dbsize 100,000,000 -comp_based_stats T -seg yes.
CRISPR-Cas loci were identified, visualized, and aligned using CRISPRviz , and type was determined using the canonical definitions defined by Koonin et al.  after inspecting flanking cas genes and their corresponding annotations. Spacer identity was investigated using BLAST+  against the nt, env_nt, and gss remote databases with the following flags: -task blastn-short -dust no -outfmt 5 -evalue 1e-5. The spacer identity heatmap was generated in RStudio using the gplots package available here: https://cran.r-project.org/web/packages/gplots/index.html. The PAM sequence was bioinformatically predicted using the procedure and CRISPRutils software package previously described by Nethery et al.  https://github.com/CRISPRlab/CRISPRutils
The authors would like to acknowledge support from NC State University and the NC Ag Foundation, and thank Sarah O’Flaherty for her editing assistance and creative insights.
Initial strain isolation was conducted by SDJ with further purification and characterization by KVD. gDNA isolation was carried out by EDH and KVD. Assembly and scaffolding of sequencing reads was performed by EDH and MAN. The subsequent comparative genomic analyses and writing of the manuscript was conducted by MAN, with input from all authors. The study was conceived by RB and SDJ. All listed authors have read and approved the manuscript.
Authors acknowledge funding from the North Carolina Ag Foundation, and support from the US Department of Agriculture.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 14.Hammes WPaRFV. The genus Lactobacillus: springer; 1995.Google Scholar
- 49.Canez C, Selle K, Goh YJ, Barrangou R. Outcomes and characterization of chromosomal self-targeting by native CRISPR-Cas systems in Streptococcus thermophilus. FEMS Microbiol Lett. 2019;366(9).Google Scholar
- 67.Wickham H. ggplot2: Elegant Graphics for Data Analysis. Use R. 2009:1–212.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.