Keywords

1 Introduction

Petroleum hydrocarbons, such as alkanes and aromatics, are widespread organic contaminants in the environment and of considerable concern to water quality, ecosystem status, and human health. While contaminations in terrestrial systems are mostly connected to anthropogenic activities (Lueders 2017), marine sediments also often feature natural hydrocarbon seeps (Scoma et al. 2017). Biodegradation by microbes is a key process reducing hydrocarbon loads in these systems, but the factors controlling the activity and efficiency of these populations are still poorly understood. Especially under anoxic conditions, which often prevail in polluted marine sediments or in the terrestrial subsurface, the detection and characterization of intrinsic degrader populations represent a lasting research challenge. To address this challenge, next to targeted cultivation strategies or non-targeted metagenomic approaches, the detection of catabolic marker genes involved in anaerobic degradation is an indispensable tool. The approach allows for a targeted detection, localization, quantification, identification, and often also respiratory classification of anaerobic hydrocarbon degraders. Such information can be vital for site monitoring or the conceiving of site-specific remediation strategies (Lueders 2017). In this chapter, we provide a synopsis of degradation pathways and catabolic gene markers which have actually been applied to trace anaerobic degrader populations in environmental systems. We also demonstrate the appeal of next-generation sequencing technologies as a high-throughput screening strategy for degrader diversity and community composition.

2 Anaerobic Degradation Pathways of Petroleum Hydrocarbons

The initial activation reaction is often the crucial step in the degradation of aliphatic or aromatic hydrocarbons by anaerobic degraders. Petroleum hydrocarbons are chemically rather stable and difficult to functionalize in the absence of oxygen as co-substrate for an oxidative attack (Rabus et al. 2016). For this reason, the capacity of microbes to degrade petroleum hydrocarbons under anaerobic conditions had been doubted until the mid-1980s (Grbić-Galić and Vogel 1987). However, a fair number of anaerobic degraders, degradation pathways, and activation mechanisms for petroleum hydrocarbons have been identified over the last decades (Rabus et al. 2016). Similar to aerobic hydrocarbon catabolism, the anaerobic degradation of petroleum hydrocarbons also proceeds via several initial activation and transformation mechanisms (peripheral degradation pathways). These funnel the compounds to central metabolites, which are then further degraded to assimilatory building blocks or fully oxidized to CO2 (Callaghan 2013b; Fuchs et al. 2011). Three major strategies for the anaerobic activation of petroleum hydrocarbons are currently known: (i) for alkanes or alkylated aromatics the addition of a methyl or methylene group to fumarate (the so-called fumarate addition; Heider et al. 2016a); (ii) oxygen-independent hydroxylation known to be involved in the degradation of ethylbenzene and related substituted benzenes (Heider et al. 2016b); and (iii) carboxylation as proposed for alkanes, benzene, and polycyclic aromatic compounds (Callaghan 2013a; Meckenstock et al. 2016).

Among these distinct strategies, fumarate addition is considered as an archetype of anaerobic hydrocarbon degradation mechanisms. It was first discovered for the degradation of toluene by the denitrifying Thauera aromatica (Biegert et al. 1996) and is catalyzed by a glycyl radical enzyme named benzylsuccinate synthase (Bss). The enzyme forms a benzyl radical from toluene, adds it to the fumarate double bond, and finally releases benzylsuccinate as the first metabolite of toluene degradation (Fig. 1). Meanwhile, a role of benzylsuccinate synthase and related fumarate-adding enzymes (FAEs) has been reported for a wide phylogenetic diversity and respiratory variety of anaerobic degraders and also for the degradation of other alkylated mono- and polyaromatics, linear and cyclic alkanes, and even linear alkylbenzene sulfonate detergents, as summarized elsewhere (von Netzer et al. 2016; Wilkes et al. 2016). Depending on the nature of the substrate (Fig. 1), the FAEs involved are also named naphthylmethylsuccinate synthases (Nms; Annweiler et al. 2000; Musat et al. 2009) or (1-methyl)alkylsuccinate synthases (Ass/Mas; Callaghan et al. 2008; Grundmann et al. 2008).

Fig. 1
figure 1

Overview of important peripheral and central catabolic pathways in the anaerobic degradation of alkylated aromatic hydrocarbons and non-methane alkanes. Genes of key enzymes in use as catabolic marker genes for degraders in environmental systems are highlighted. BSS benzylsuccinate synthase, NMS naphthylmethylsuccinate synthase, ASS/MAS (1-methyl)alkylsuccinate synthase, BCR benzoyl-CoA reductase, NCR naphthoyl-CoA reductase, BamA ring-cleaving 6-oxocyclohex-1-ene-1-carbonyl-CoA hydrolase (Scheme is adapted from Lueders and von Netzer 2014; von Netzer et al. 2016)

Hydroxylation of alkyl chains by ethylbenzene dehydrogenase (EBDH) and related enzymes is a central mechanism in the activation of ethylbenzene and related substituted benzenes (Heider et al. 2016b). Although respective enzymes are currently only known from a few denitrifying strains, the mechanism seems to be involved in biodegradation also under a range of other redox conditions (Dorer et al. 2016). For non-substituted aromatics like benzene and naphthalene, the current understanding of anaerobic degradation pathways is still incomplete. Yet, a number of recent studies suggest that carboxylation may be a conserved mechanism for the activation of these substrates, catalyzed by the proposed anaerobic benzene carboxylase (ABC; Abu Laban et al. 2010; Luo et al. 2014) or by naphthalene carboxylase (Bergmann et al. 2011; Mouttaki et al. 2012). The resulting aromatic acids (benzoate, 2-naphthoate) can then be activated directly by CoA ligation and funneled into the central degradation pathways used also for alkylated aromatics (Meckenstock et al. 2016; von Netzer et al. 2016).

All of the above peripheral aromatic degradation pathways funnel their substrates to central metabolites, which are then subjected to further reduction, dearomatization and ring cleavage (Rabus et al. 2016). Benzoyl-CoA is the central metabolite of monoaromatic hydrocarbon degradation, and two enzyme systems are known to be involved in its dearomatization: either the ATP-dependent class-I benzoyl-CoA reductase (Bcr/Bzd) in facultative anaerobes like denitrifying Thauera or Azoarcus spp. or the ATP-independent class-II benzoyl-CoA reductase (BamB) in strict anaerobes like in iron-reducing Geobacter spp. or sulfate-reducing Desulfobacterium spp. (Boll et al. 2014; Porter and Young 2014). After ring reduction, a ring-cleaving hydrolase (BamA) transforms the former aromatic ring into linear CoA-fatty acids (Kuntze et al. 2008; Porter and Young 2013; Staats et al. 2011), which are then subject to a β-oxidation-like degradation to acetyl-CoA or complete oxidation to CO2 (Fig. 1). Such linear CoA-fatty acids are also products of the anaerobic degradation of n-alkanes (Callaghan 2013a; Wilkes et al. 2016). The further sections of this chapter focus on the application of the catabolic genes involved in anaerobic degradation of petroleum hydrocarbons to dissect degrader communities in complex natural systems.

3 Catabolic Marker Genes for Anaerobic Hydrocarbon Degraders in Natural Systems

A wide diversity of bacterial cultures and enrichments is known to make use of the above catabolic reactions while degrading petroleum hydrocarbons and respiring different anaerobic electron acceptors (Callaghan 2013a; Heider and Schühle 2013; Rabus et al. 2016; Weelink et al. 2010; Widdel et al. 2010). Generally speaking, typical anaerobic hydrocarbon degraders can be found within the Rhodocyclaceae (Betaproteobacteria), the Geobacteraceae, Desulfobacteraceae, Syntrophobacteraceae (Deltaproteobacteria), and the Peptococcaceae (Clostridia). They either depend on typical anaerobic electron acceptors such as nitrate, ferric iron, or sulfate, while some are also capable of fermentative hydrocarbon degradation. These different guilds of anaerobic hydrocarbon degraders are, at best, functionally defined (via substrate usage and respiratory capacity) but phylogenetically diverse and often widely distributed throughout bacterial lineages. Therefore, anaerobic hydrocarbon degraders cannot be selectively targeted in environmental samples using ribosomal marker genes, at least not above the strain level. This is why soon after their initial discovery, researchers have realized the potential of genes for conserved key enzymes in anaerobic hydrocarbon degradation (so-called functional or catabolic marker genes) to develop PCR assays allowing to specifically trace anaerobic hydrocarbon degraders in environmental systems.

3.1 Genes for Fumarate-Adding Enzymes

The benzylsuccinate synthase alpha subunit (bssA) and related FAE genes have been widely used as catabolic markers. Originally introduced for the detection of denitrifying toluene degraders within the Betaproteobacteria (Beller et al. 2002), increasing genomic and environmental sequence availability and continuous primer development have led to optimized detection systems for anaerobic alkylbenzene degraders applicable in iron-reducing, sulfate-reducing, and methanogenic systems. A comprehensive overview of published primers and assays is beyond the scope of this chapter but available elsewhere (von Netzer et al. 2016). Related to the sequence motifs realized in bssA detection assays, PCR primers capable of specifically recovering assA (Callaghan et al. 2010) and nmsA genes (von Netzer et al. 2013) in the environment have also been developed.

In samples taken directly from the terrestrial subsurface, bssA genes were first reported from a number of tar-oil-contaminated aquifer sediments in Germany (Winderl et al. 2007). Several as-of-then unidentified catabolic gene lineages were found, especially at sites dominated by sulfate reduction. The results of this study emphasized the possibility of yet unknown degrader populations to be important for bioremediation in situ. BssA gene-based diversity screenings of anaerobic degraders have since been conducted at a number of contaminated terrestrial systems, both in the USA (Callaghan et al. 2010; Yagi et al. 2010) and in Europe (Benedek et al. 2016; Osman et al. 2014; Staats et al. 2011), revealing locally dominating and apparently site-specific populations of either beta- or deltaproteobacterial degraders to be detectable at respective sites. Ongoing optimization of FAE-targeted primer sets has also demonstrated that bssA genes of clostridial affiliation and nmsA genes can be recovered from terrestrial samples (Martirani-Von Abercron et al. 2016; von Netzer et al. 2013). While the detection of non-proteobacterial FAE genes had been a challenge with initial primer pairs, adequate tools are now at hand, and the importance of clostridial hydrocarbon degraders in terrestrial systems of complex microcosms derived thereof is undisputed (Abu Laban et al. 2015; Aitken et al. 2013; Fowler et al. 2012; Sun et al. 2014; Winderl et al. 2010). Nevertheless, the optimization of assays for a more comprehensive recovery of FAE genes with primers less selective for known proteobacterial FAE gene sequences is still an ongoing process.

The potential of bssA-targeted qPCR to identify hot-spots of biodegradation by localizing and quantifying degrader populations across vertical plume transects was first demonstrated for a tar-oil-contaminated site in Germany (Winderl et al. 2008). Similarly, bssA-targeted qPCR was also used to compare distinct degrader abundances across longitudinal transects of a plume (Oka et al. 2011), revealing degrader enrichment in zones of highest contamination. A qPCR assay designed to detect bssA genes of specific sulfate-reducing hydrocarbon degraders was applied to monitor degrader abundance in groundwater samples from comparative bioremediation galleries at a US Air Force base (Beller et al. 2008), revealing that alternative electron donor amendment (ethanol) could actually stimulate degrader abundance. Moreover, in water samples taken from an artificial toluene plume introduced to an indoor model aquifer, comparative bssA gene-to-transcript ratios were quantified to discriminate between actual active gene expression (mRNA) and inactive degrader populations transported downstream of biodegradation zones by groundwater flow (Brow et al. 2013). This elaborate example illustrates nicely how multiple levels of molecular data and spatial analyses can be combined to better elucidate processes in contaminated systems.

In marine systems, FAE gene-based degrader detection was first introduced for assA genes, revealing a remarkable diversity of yet-unknown catabolic gene lineages to be extant in sediments from several hydrocarbon-contaminated waterways in the USA (Callaghan et al. 2010) and also in sediments of the Chesapeake Bay (Johnson et al. 2015). Several novel clusters of bssA and assA genes were also reported for different marine sediments in Spain, contaminated either by accidental oil spills or by experimental hydrocarbon exposure (Acosta-González et al. 2013). Deltaproteobacterial FAE genes were especially frequent in these libraries, as expected for marine sediments that are typically rich in sulfate-reducing bacteria. Degrader diversity seemed to depend on both nature and severity of the contamination. Similarly, Kimes and colleagues used a simultaneous screening of assA and bssA genes to successfully demonstrate anaerobic catabolic potentials in sediments impacted by the Deepwater Horizon oil spill (Kimes et al. 2013). Very recently, bssA-targeted qPCR successfully revealed that anaerobic toluene degraders were quantitatively stimulated in polluted marine sediments subjected to anodic electroremediation (Daghio et al. 2016). These studies clearly demonstrate the ample potential of FAE gene-targeted detection assays in the monitoring and population-informed remediation of contaminated sites.

Also from marine systems exposed to natural oil seeps and mud volcanoes, FAE gene pools have been recovered (Gittel et al. 2015; Stagars et al. 2016; von Netzer et al. 2013). While the study of Gittel and colleagues revealed marked distinction in masD gene pools recovered from a variety of either pristine or seepage-impacted sediment samples off the Danish coast, Stagars et al. were the first to apply next-generation sequencing (NGS) to FAE gene amplicons. Thus, they were capable of revealing a remarkable diversity of no less than 420 different MasD species-level OTUs (at 96% amino acid similarity) from a global selection of marine methane, gas, and hydrocarbon seeps (Stagars et al. 2016). This study highlighted the still largely untapped potential of NGS approaches in functional marker gene screenings of anaerobic degraders of petroleum hydrocarbons.

3.2 Genes for Central Catabolic Markers

Marker genes from central catabolic pathways in anaerobic hydrocarbon degradation have also been used in environmental studies. Especially for degraders of non-substituted aromatic compounds, where peripheral activation mechanisms often remain unknown, central pathways offer indispensable handles for degrader detection. The concept was introduced in 2005, when two independent groups reported notable diversities and partially unknown lineages of class-I benzoyl-CoA reductase genes (bcrA and bzdN) in contaminated groundwater and in estuarine sediment (Hosoda et al. 2005; Song and Ward 2005). As-of-then unidentified bcrA genes of deltaproteobacterial affiliation were also reported from crude oil-contaminated soils (Higashioka et al. 2009). Later, the quantification of class-I BCR genes has been used to localize and quantify anaerobic aromatics degraders in different wells of a crude oil-impacted US aquifer (Fahrenfeld et al. 2014) and in successive stages of a wastewater treatment system for dyeing effluents (Li et al. 2015). The applicability of the dearomatizing 2-naphthoyl-CoA reductase (NCR) for detecting degraders of polycyclic aromatic hydrocarbons has, in principle, also been demonstrated for a number of enrichment cultures (Morris et al. 2014), albeit not directly for environmental samples.

Class-II BCRs (bamB) and also the downstream ring-cleaving hydrolases (bamA) have also been employed to detect and characterize anaerobic degraders in contaminated systems. Most comprehensively, Kuntze et al. (2011) performed a comparative qualitative assessment of bcrC, bamB, and bamA gene pools in samples from two benzene-contaminated aquifers in Germany, revealing that mostly beta- or deltaproteobacterial degraders could be consistently recovered from both via the different markers. BamA gene pools were considerably more diverse than that of bssA in a landfill leachate plume in the Netherlands but surprisingly less abundant in highly contaminated zones than outside the plume (Staats et al. 2011). Crude oil-degrading enrichment cultures from marine sediments incubated at psychrophilic vs. mesophilic temperatures were shown to host diverse but clearly distinct bamA gene pools affiliated to sulfate-reducing degraders (Higashioka et al. 2011). Similarly, Sun and colleagues found that the phylogeny of bamA gene pools recovered from toluene-degrading enrichment cultures reflected the actual redox conditions across a whole range of different inocula, including contaminated soils and activated sludge (Sun et al. 2014). Most recently, the structure of anaerobic degrader communities as recovered via bamA genes from different Antarctic soils was shown to be distinct in highly contaminated samples vs. soils with intermediate or no contamination (Sampaio et al. 2017). It must be stated, however, that the detection of the central catabolic genes mentioned above need not always be strictly linked to the presence of anaerobic degraders of petroleum hydrocarbons. Also the anaerobic degradation of humic acids, lignins, and aromatic amino acids involves the respective catabolic routes and metabolites, as summarized recently by Porter and Young (2013, 2014). Therefore, an approach toward targeted anaerobic degraders involving multiple lines of evidence, including peripheral and central markers, is always highly commended.

4 NGS Approaches for Functional Marker Genes

The fact that only one of the studies summarized above has relied on a NGS-based marker gene query (Stagars et al. 2016) clearly illustrates the untapped potential of this approach for analyzing anaerobic degrader communities. The use of NGS-based amplicon sequencing for functional marker genes is well-established, e.g., for genes of the particulate methane monooxygenase, pmoA (Lüke and Frenzel 2011); the ammonia monooxygenase, amoA (Pester et al. 2012); the nitric oxide reductase, nosZ (Philippot et al. 2013); or the alkane monooxygenase, alkB (Wallisch et al. 2014), to name only a few. In contrast to ribosomal amplicon sequencing, where prominent pipelines and databases are generally available (Caporaso et al. 2010; Schloss et al. 2009), functional gene amplicon sequencing still often relies on specific and manually curated sequence databases, i.e., as found in the FunGene repository (Fish et al. 2013), and on in-house data handling pipelines. The AnHyDeg repository, a specific and highly curated database for catabolic genes involved in anaerobic hydrocarbon degradation has only recently been released (Callaghan and Wawrik 2016).

Technically speaking, read length is clearly an issue in all marker gene sequencing approaches. In most applications of NGS-based amplicon sequencing, depending on the platform, sequencing has been limited to relatively short amplicons between ~300 and 500 bp in length (Luo et al. 2012). For certain platforms, additional problems with high numbers of frameshifts have been reported for functional marker genes (Zhang et al. 2015). For amplicons longer than the respective read lengths, it was either necessary to reduce the primer window, thus producing shorter amplicons, or to use a paired-end approach. However, recent technical developments in sequencing platforms have resulted in significant increases in sequencing length, to over several dozen of kb for the PacBio and MinION platforms (Benítez-Páez et al. 2016; Wagner et al. 2016). Together with other methodological innovations like the primer-free sequencing of full-length marker genes (Karst et al. 2016), or “epicPCR” allowing to link ribosomal and functional marker gene sequencing at the single-cell level (Spencer et al. 2016), a significant potential remains to be realized for catabolic marker gene sequencing.

Nevertheless, a few years before the discontinuation of the 454 GS FLX platform in 2016, Roche/454 had released a new long-read sequencing chemistry (~800–1000 bp) for its FLX+ sequencer. Due to the relatively short life span of the technology before becoming outdated, it has rarely been used for amplicon sequencing (D’Amore et al. 2016). The long-read sequencing chemistry perfectly matched the typical ~800 bp length of FAE gene amplicons necessary for an adequate diversity coverage of anaerobic degraders (von Netzer et al. 2016), which motivated us to test this approach in a proof-of-principle demonstration of long-read FAE gene amplicon sequencing. This was appealing, since FAE gene primers are typically highly degenerate, yielding notorious PCR by-products and resulting in classical cloning-and-sequencing approaches to be very laborious (von Netzer et al. 2013). In the following section of this chapter, the procedure and results of a long-read NGS approach are presented for the example of FAE gene amplicons generated from a tar-oil-contaminated aquifer in Germany. The main objective was to demonstrate that long (~800 bp) functional marker gene reads can actually be recovered from complex degrader communities and to query how NGS-based screenings of the degraders would go beyond previous Sanger sequencing-based characterization of the same degraders (von Netzer et al. 2013; Winderl et al. 2007). Another goal was to test whether bidirectional amplicon sequencing would be necessary to recover full-length amplicon sequence information or whether unidirectional sequencing would suffice.

5 Long-Read Sequencing of FAE Gene Amplicons

5.1 Methodology

Sediment from the lower fringe (6.85 m below surface) and just below (7.15 m) a toluene plume in Flingern (Düsseldorf, Germany) was sampled in 2009, as previously described (Pilloni et al. 2011). The sediment was stored at −20 °C until nucleic acid extraction in 2012. DNA was extracted in three different replicates per depth using a phenol-chloroform extraction with bead beating as described (Pilloni et al. 2011). Two technically replicated sequencing libraries were generated per triplicate biological DNA extract from 6.85 m depth, while triplicate extracts were sequenced without further technical replication from 7.15 m depth.

The primer sets 7772f (Winderl et al. 2007)/8543r (von Netzer et al. 2013) for bssA libraries and 7768f / 8543r (von Netzer et al. 2013) for FAE-B libraries were used. The FAE-B primers were designed to recover a diversity of more deeply branching bssA homologues and also nmsA genes (von Netzer et al. 2013). Direct amplification of template DNA was not optimally effective with fully adapter- and identifier-tagged primers; thus pre-amplified FAE gene amplicons were first generated with untagged primers (~20–25 PCR cycles), as described by von Netzer et al. (2013). Afterward, these primary amplicons were re-amplified with a second round of PCR and fully tagged primers (five to ten PCR cycles). Two distinct sequencing approaches were used. For unidirectional reads, sequencing occurred from the forward primer alone. For the bidirectional approach, both forward and reverse primers were barcoded, and thus, sequencing was done from both ends of the amplicons. Tagging was done with either Lib-A and Lib-B (bidirectional) or Lib-L (unidirectional) adapters and multiplex identifiers (MID) attached to the primers as previously described (Pilloni et al. 2012; Zhang and Lueders 2017). Emulsion PCR, emulsion breaking, and sequencing on a 454 GS FLX+ sequencer were done with appropriate chemistry as recommended by the manufacturer (Roche Diagnostics, Penzberg, Germany).

Reads were de-multiplexed and quality-trimmed as previously described (Pilloni et al. 2012) but using the Greengenes trimming algorithm as implemented in Prinseq (version 0.20.4, http://prinseq.sourceforge.net). Reads below 250 bp length were excluded from further processing (Pilloni et al. 2012). FAE gene amplicons were classified using mothur version 1.33.3 (Schloss et al. 2009) with an in-house FAE gene alignment of 795 bssA, nmsA, and homologous genes generated in ARB (Ludwig et al. 2004). Our alignment excluded assA genes, as these were not to be expected at the Flingern site (von Netzer et al. 2013). All sequencing raw data have been deposited with the NCBI sequence read archive under the SRA accession number SRP131608. Diversities of detected FAE gene lineages were calculated in R using the package “vegan” (Oksanen et al. 2017). For the comparison of read length and sequencing depth resulting from the different sequencing approaches, boxplot diagram were obtained in R with the script boxplot, with standard setting identifying outliers as measurements beyond the 1.5 * interquartile range.

5.2 FAE Gene Libraries from Flingern Sediments

Anaerobic hydrocarbon degraders at the Flingern site have been previously investigated via “classical” FAE gene sequencing, making it an ideal test site for the establishment of long-read FAE amplicon sequencing. A low diversity of FAE genes affiliated to the deltaproteobacterial “F1-cluster” bssA and also deltaproteobacterial nmsA genes have been previously reported to dominate clone libraries from the site, depending on which primers were utilized (von Netzer et al. 2013; Winderl et al. 2007).

Reasonable average yields of ~4000 reads across all bssA libraries (Table 1) were obtained. However, average read numbers were much lower for FAE-B amplicons, with only ~1400 reads and ~650 reads in average for the bidirectional and unidirectional libraries, respectively (Table 1). This finding was interpreted as a clear (and not unexpected) case of primer bias, as the 7768f primer used for FAE-B amplicons is highly degenerate, which is necessary to recover also more deeply branching bssA homologues. Resulting amplicons were more difficult to adequately purify from shorter and non-specific PCR by-products, which was reflected in lower numbers of total “good” reads and also a higher frequency of unidentified (non-FAE gene) sequences in these libraries. Still, a fair recovery of targeted FAE gene pools seemed to be possible with both primer sets. Overall median read lengths were between 640 and 760 bp for all libraries, but clearly highest in average (~750 bp) for unidirectional libraries (Fig. 2). However, unidirectional read libraries also contained a notable number of shorter reads. There was no significant decrease in median read length after trimming, which was taken as a general sign of high sequencing quality. There was also no significant difference between forward and reverse primer read lengths observed for separated bidirectional sequencing data sets (Fig. 2).

Table 1 Average sequence read yield for different sequencing libraries generated with either bi- or unidirectional FLX+ sequencing in this study. Averages are given ± standard deviation of nine amplicon libraries per column
Fig. 2
figure 2

Distribution of read length in FAE gene amplicon libraries before and after quality trimming. Three biological replicates were sequenced per sample in 6.85 m (technical replicate I), 6.85 m (technical replicate II), and 7.15 m. The dots represent outliers identified in R

The composition of FAE gene pools as recovered from the two sediment depths and via the different sequencing approaches is illustrated in Fig. 3. Phylogenetic placement of the detected lineages is shown in Fig. 4. Libraries of bssA amplicons were always dominated by reads affiliated to the desulfobulbal F1-cluster bssA (50–58% of all reads), irrespective of sequencing strategy or sediment depth. The second most abundant lineage, the clostridial F2-cluster bssA, was slightly more abundant at 7.15 m than at 6.85 m sediment depth (~42% vs. 25%). Further FAE gene lineages consistently recovered in bssA libraries from both depths were affiliated to the Betaproteobacteria, other Deltaproteobacteria, as well as the more deeply branching and as-yet unidentified T-cluster bssA homologues. These lineages were previously not detected in clone libraries of FAE genes generated for the Flingern site. T-cluster FAE genes and betaproteobacterial bssA were also recovered in sequencing libraries of FAE-B amplicons, where the T-cluster was even dominant (50–70%) at 6.85 m (Fig. 3b). However, deltaproteobacterial nmsA genes were also abundant in these libraries, especially at 7.15 m (74–77%). The Shannon diversity of recovered FAE-B lineages was slightly lower than that of bssA libraries (0.8 ± 0.18 vs. 1.06 ± 0.12, respectively).

Fig. 3
figure 3

Community structures of FAE gene pools as recovered from two different depths of a tar-oil-contaminated aquifer with different FAE primers. (a) Illustrates the technical reproducibility (I, II) of biologically replicated sequencing libraries (all 6.85 m) generated with either uni- or bidirectional long-read 454 FLX+ amplicon sequencing. (b) Compares degrader community structure recovered with two different primer sets (bssA, FAE-B), from two different sediment depths and with either uni- or bidirectional amplicon sequencing. Read abundances are averaged over results from triplicate libraries per sample, error bars represent standard deviations and are shown as negative only. Color coding and naming of FAE gene lineages corresponds to that given in Fig. 4

Fig. 4
figure 4

Overview of the phylogeny of known pure culture and environmental FAE gene sequences as mentioned in the text. Several lineages are collapsed with only a few representatives named. Outgroup: related pyruvate formate lyase (PFL) genes. The scale bar represents 10%. Color coding and naming of FAE gene lineages corresponds to Fig. 3 (The tree has been adapted from Lueders and von Netzer 2014; von Netzer et al. 2016)

Specific distinctions were detected in lineage abundances between uni- and bidirectional sequencing libraries, such as a consistent lower abundance of T-cluster bssA homologs in all bidirectional libraries. Still, overall degrader community patterns were consistently recovered for both depths and amplicons, irrespective of whether uni- or bidirectional sequencing was employed. The technical and biological reproducibility between sequencing libraries was very strong (Fig. 3a), as reported previously for 16S rRNA gene libraries from the same site (Pilloni et al. 2012). This means that DNA extraction, amplification, and the NGS sequencing procedure were robust and that factors like primer selection, tagging, and sequencing strategy were much more important for outcomes on community structure.

In summary, this study demonstrates that long-read NGS of FAE gene amplicons is feasible and that it is capable of delivering reproducible screening results on anaerobic hydrocarbon degrader communities and degrader diversity. The much greater sequence yield compared to classical cloning-and-sequencing approaches (von Netzer et al. 2013; Winderl et al. 2007) delivers a greater diversity of detected lineages and more robust lineage abundances. In the present study, FAE gene OTUs were not resolved to species-level, i.e., as has been done for masD OTUs by Stagars et al. (2016) with shorter FLX titanium reads (~450 bp). Still, the much greater total number of NGS sequence reads and also the longer FAE gene information recovered in this study should allow for a meaningful dissection of degrader microdiversity via catabolic gene OTUs in successive studies.

6 Summary and Research Needs

This chapter summarizes the most recent state of the art in catabolic gene surveys for anaerobic degraders of petroleum hydrocarbons in environmental systems. Primer systems for a wide diversity of both peripheral and central catabolic marker genes in anaerobic degradation are now at hand. They have been widely employed in studies on terrestrial and marine systems either naturally or anthropogenically contaminated with alkanes and aromatic hydrocarbons. Thus, crucial insights on previously hidden lineages and diversities of anaerobic hydrocarbon degraders have been generated for many sites. Part of these catabolic gene lineages have now been securely associated with common degrader lineages within the Betaproteobacteria, the Deltaproteobacteria, and also the Clostridia, the latter being more frequently detected in the contaminated terrestrial subsurface than in marine systems. However, several prominent marker gene lineages, such as the T-cluster FAE gene homologues (Fig. 4), remain to be better integrated into current degrader taxonomy. Moreover, increasing experimental cues from enrichment cultures or (meta)genomic information on key enzymes and marker genes involved in activation mechanisms other than fumarate addition are currently becoming available. This will foster the development and application of assays capable of detecting degraders utilizing, e.g., oxygen-independent hydroxylation or carboxylation mechanisms for hydrocarbon activation.

The original research presented in the second part of this chapter shows that long-read NGS analysis of FAE gene amplicons is a powerful and reproducible tool to comprehensively screen degrader diversities in environmental systems. Recovery of FAE gene lineages was clearly higher via NGS, facilitating a unique “deep” access to degrader microbiota in contaminated systems. Although the utilized 454 GS FLX+ long-read sequencing platform is now already outdated, more modern long-read sequencing platforms such as the PacBio SMRT sequencing technology have recently become available for marker gene sequencing (Schloss et al. 2016; Wagner et al. 2016) and will surely also rapidly find their way into anaerobic degrader screening. Possibly, also other primer-independent NGS approaches can offer even more unbiased handles on degrader diversity, such as a recently introduced strategy involving sequence captured by hybridization and next-generation sequencing of FAE genes (Ranchou-Peyruse et al. 2016). In perspective, it remains to be emphasized that the vital information on intrinsic degrader assemblages accessible via marker gene sequencing must continue to find its way into an enhanced, population-based monitoring, management, and remediation strategies for contaminated sites.