Aerobic Hydrocarbon-Degrading Gammaproteobacteria: Oleiphilaceae and Relatives
Despite the ubiquity of marine hydrocarbon-degrading bacteria from the family Oleiphilaceae, until now there is only one strain from this family with a validly published name and fully assembled genome, Oleiphilus messinensis strain ME102 (= DSM 13489). The availability of draft genomes of 27 other isolates gave us the opportunity to get an insight into the genome evolution and speciation patterns within this group. Whole-genome alignments and genome-to-genome distance calculation data demonstrated that Oleiphilaceae consists of four distinct genome clusters that correspond to the species level. Furthermore, we suggest that all known Oleiphilaceae genomes cluster into two genera, the first one being Oleiphilus, which includes O. messinensis ME102 and the second represented by bacteria isolated near Hawaii. The Oleiphilaceae pangenome of 1796 core gene clusters roughly corresponds to the two-thirds of an Oleiphilaceae genome. All high-quality genomes had double copies of almA coding for flavin-binding family monooxygenase linked with degradation of long-chain alkanes. Alkane monooxygenases with pairwise identities between 43% and 86.5% were encoded by four genomes, with two of them having double loci. Cytochromes P450 were present in all genomes and were assigned to two distinct clusters, which, together with the low redundancy of alkane monooxygenases, points at different microorganisms as the sources of acquisition of alkane-monooxygenation enzymes by Oleiphilaceae.
Hydrocarbonoclastic bacteria (HCB) represent wide group of microorganisms involved in degradation of oil and oil derivatives, which are common pollutants in marine environments. HCB are present in Proteobacteria, Actinobacteria, Firmicutes, FCB (Fibrobacteres, Chlorobi and Bacteroidetes) group bacteria, etc. In turn, known obligate HCB, for which hydrocarbons are sole, or almost sole, source of energy and carbon, are represented only by taxa within Gammaproteobacteria with the exception of Thalassospira (Alphaproteobacteria) (Berry and Gutierrez 2017). To date, despite the availability of a vast number of metagenome-derived genome sequences through public genome databases, only a small number of studies involved comparative genomics and/or detailed genome-based analysis of phylogenetic diversity within different groups of obligate HCB. One of such examples must be the family Oleiphilaceae, which includes Oleiphilus messinensis – one of the first discovered obligate alkane degraders, isolated and described in 1998 (Yakimov et al. 1998) and sequenced only recently in 2017 (Toshchakov et al. 2017). Here we present a brief analysis of publicly available Oleiphilaceae genomes, including verification of taxonomic attribution based on 16S rRNA genes, average nucleotide identity, and pangenome analysis.
2 Type Strain, Global Distribution, and Genomic Properties of Oleiphilaceae
Subtidal sediments of the North Atlantic coast of Spain impacted by tanker Prestige oil spill (JQ580103.1, Rodas Beach, 42° 13′ 56″ N; 8° 53′ 50″ W and JQ579692.1, Figueiras Beach, 42° 8′ 8″ N; 8° 32′ 6″ W) (Acosta-González et al. 2013)
Chronically contaminated coastal sediments of Etang de Berre lagoon (FM242233.1, France, 43° 28′ 00″ N; 5° 10′ 00″ E) (Yakimov and Golyshin 2014)
100-m-deep water sampled from the Southern Ocean iron fertilization experiment (JX530194.1, Southern Ocean, 47° 30′ 05″ S, 15° 26′ 42″ W) (Singh et al. 2015)
Enrichment cultures, derived from samples taken from the bottom of euphotic zone (deep chlorophyll maximum, 95 m depth) and the upper mesopelagic zone (250 m depth) near Hawaii islands (22° 46′ 41.5″ N, 158° 04′ 10.2″ W) (Sosa et al. 2017)
The complete genome of O. messinensis ME102T was sequenced and analyzed recently, showing the presence of genes necessary for both short- and long-chain alkane catabolism, as well as coding potential for utilization of other alkane derivatives, e.g., gene for haloalkane dehalogenase (Toshchakov et al. 2017). Notably, strain ME102T showed an unprecedented level of genome mobility for OHCB, stressing the importance of wider genome analysis of Oleiphilaceae representatives to better understand their biology and diversity. In 2017, the family Oleiphilaceae gained 27 draft genomes, acquired during a study of bacterial degraders of phosphonates associated with high-molecular-weight dissolved organic matter (HMWDOM) produced by photosynthetic microorganisms of surface waters. Water samples for that study were taken near Hawaii Islands from the bottom of euphotic zone (deep chlorophyll maximum, 95 m depth) and the upper mesopelagic zone (250 m depth). These samples were used for setting up series of enrichments amended with HMWDOM collected by ultrafiltration of seawater collected at 20 m depth. High-throughput sequencing of the obtained cultures showed that the most abundant group was Oleiphilus-related Gammaproteobacteria, which was specifically enriched in mesopelagic samples. Analysis of correlation of abundance of Oleiphilus-related OTUs with depth showed that it continuously increased with sampling depth, showing maximum at >200 m depth. The authors also reported that the growth rate of Oleiphilaceae isolates had been significantly stimulated by addition of HMWDOM, hydrocarbon compounds, and fatty acids, reflecting a broad substrate specificity of Oleiphilaceae representatives (Sosa et al. 2017).
3 Genome-Based Assessment of Taxonomic Relatedness in Oleiphilaceae
Oleiphilaceae genome assemblies available by January 2019 in NCBI Assembly database
Strain heterogeneity, %
O. messinensis ME102
Reassembled Oleiphilaceae draft genomes, representing distinct clusters, revealed with ANI analysis and clustering
Strain heterogeneity, %
4 Four Distinct Genomic Clusters Within Oleiphilaceae
16S rRNA gene identity matrix, values at intersections equals to gene identity in percent
Oleiphilus sp. HI0043
Oleiphilus sp. HI0118
O. messinensis ME102TT
Oleiphilus sp. HI0009
Oleiphilus sp. HI0043
Oleiphilus sp. HI0118
5 New Uncultivated Taxa Within Oleiphilaceae
According to 95% 16S rRNA gene identity threshold for genus delimitation (Stackebrandt and Goebel 1994), we suppose that Oleiphilus sp. HI0043, Oleiphilus sp. HI0118, and Oleiphilus sp. HI0009 form a new genus of Oleiphilaceae. Analysis of ANI values and dDDH estimation suggest that these genomes represent three different species. Thus, while 16S rRNA gene sequences of Oleiphilus sp. HI0118 and Oleiphilus sp. HI0009 have 98.17% identity, falling close to the 98.65% 16S rRNA identity threshold (Kim et al. 2014), the ANI value between these genomes is significantly less than the 95% threshold for ANI-based species delimitation proposed by Richter and Rossello-Mora (2009).
6 Pangenome, Hydrocarbon Utilization, and Osmoprotection
We conducted the analysis of gene clusters responsible for hydrocarbon utilization in genomes of Oleiphilaceae bacteria. Genes of enzymes involved in hydrocarbon utilization and ectoine biosynthesis were identified either by alignment against annotated sequences from NCBI nr protein database using BLASTp algorithm (Altschul et al. 1997) or by alignment against respective pfam hmm profile (Finn et al. 2015) using hmmsearch tool from HMMER package (Eddy 2011). Two clusters of cytochrome P450 monooxygenases were identified: one containing proteins presented in all four genomes analyzed and the second presented in only three genomes (HI0009, HI0118, and O. messinensis 102T). Notably, O. messinensis ME102T genome contains three genes of CYP450, one of which is disrupted by an active IS4 mobile element, therefore accentuating the dynamic nature of Oleiphilaceae genomes (Toshchakov et al. 2017). The gene almA coding for flavin-binding family monooxygenase, linked with long-chain alkanes degradation (Shao and Wang 2013), was found twice in all genomes except that from strain HI0043. All seven sequences formed one orthologous cluster. Alkane monooxygenases were present as single copies in O. messinensis strain ME102T and strain HI0009, and double copies in strains HI0043 and HI0118.
Ectoine is a compatible solute which protects bacteria against osmotic stress (Louis and Galinski 1997). To study possible differences in adaptation to lower temperatures, high osmosis, or elevated hydrostatic pressure, we analyzed genetic loci responsible for ectoine biosynthesis. The ectoine synthesis operon ectABCR includes genes coding for: diaminobutyric acid (DABA) aminotransferase (EctB), DABA acetyltransferase (EctA), ectoine synthase (EctC), and MarR-like transcriptional regulatory protein (EctR). This operon was found in all Oleiphilaceae genomes except HI0009, which was sampled from a depth of 95 mbsf as opposed to HI0043 and HI0118, sampled from 250 mbsf (Sosa et al. 2017). Increased level of transcription of ectoine biosynthesis operon at elevated hydrostatic pressure was reported for Alcanivorax borkumensis (Scoma et al. 2016). Despite the unclear role of ectoine in the response to stress induced by hydrostatic pressure, the lack of an ectoine biosynthesis operon in shallow-water strain HI0009 gives an opportunity to speculate that this solute might be utilized by Oleiphilaceae not only as an osmoprotector, but also as a piezoprotector.
In this study, we analyzed all publicly available (by the beginning of 2019) Oleiphilaceae genomes and closest environmental 16S rRNA gene sequences. Whole-genome alignments and digital DNA–DNA hybridization demonstrated that Oleiphilaceae includes four distinct genome clusters that correspond to a species. Additional 16S rRNA gene alignment and phylogenetic reconstruction showed that Oleiphilaceae genomes could be divided into two genera; the first includes O. messinensis ME102 and the second is represented by bacteria isolated near Hawaii islands. Low quality of assembly and high level of contamination motivated us to reassemble the genomes of the “Hawaiian group” from raw reads available at NCBI SRA. Resulting assemblies had higher completeness, lower contamination, improved N50 metrics, and smaller contig number.
Pangenome analysis of complete and reassembled genomes of Oleiphilaceae resulted in 1796 core gene clusters, which roughly correspond to the two-thirds of Oleiphilaceae genome. The number of unique genes in the genome is in good agreement with genome size and distance from other genomes. All genomes of Oleiphilaceae bacteria have genes responsible for alkane degradation, such as genes coding for alkane monooxygenase, flavin-binding family monooxygenase, and cytochrome P450 monooxygenase.
The work of ST was supported by the RSF project # 17-74-30025. MF acknowledges grants PCIN-2014-107 (within ERA NET IB2 grant nr. ERA-IB-14-030—MetaCat), PCIN-2017-078 (within the Marine Biotechnology ERA-NET (ERA-MBT) funded under the European Commission’s Seventh Framework Programme, 2013–2017, Grant agreement 604814), BIO2014-54494-R, and BIO2017-85522-R from the Ministerio de Ciencia, Innovación y Universidades, formerly Ministerio de Economía, Industria y Competitividad. MMY, TNC, OVG, MF, KEJ, and PNG received funding from the European Union’s Horizon 2020 research and innovation program Blue Growth: Unlocking the potential of Seas and Oceans under grant agreement no.  (project acronym INMARE). PNG acknowledges ERA NET IB2, grant no. ERA-IB-14-030, and UK Biotechnology and Biological Sciences Research Council (BBSRC), grant no. BB/M029085/1. TCH, OVG and PNG acknowledge the support of the Centre for Environmental Biotechnology Project funded by the European Regional Development Fund (ERDF) through the Welsh Government.
- Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477PubMedPubMedCentralCrossRefGoogle Scholar
- Rodriguez-R LM, Konstantinidis KT (2016) The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ Preprints 4:e1900v1Google Scholar