A total of forty taxa (33 crassiclitellates, one moniligastrid and six outgroup taxa) were sampled for this study (Table 1). James and Davidson  used representatives of several clitellate taxa as outgroups for their analysis of crassiclitellate phylogeny based on 18S data, but only used an enchytraeid for most other analyses (including combined analyses of multiple loci). Their 18S Bayesian phylogeny (Fig. 1 in ) suggested that Haplotaxidae s. str. (represented by Haplotaxis gordioides) was sister to Metagynophora (Crassiclitellata + Moniligastridae), and our preliminary analyses of a broader sample of clitellate transcriptomes also suggested that members of Haplotaxidae are the closest extant relatives of Metagynophora (not shown). Haplotaxidae, with its currently recognized eight genera, is no longer considered to be monophyletic and has long been regarded as a “dustbin” for slender, primitive-looking clitellates [10, 14,15,16,17]. We chose representatives of four haplotaxid species, Lumbriculidae (Lumbriculus variegatus) and Propappidae (Propappus volki) as outgroups; P. volki was used to root the phylogeny. No leeches or branchiobdellidans were used in this study, for two reasons. First, previous work [13, 18] and preliminary analyses including several leech and branchiobdellidan transcriptomes supported a clade comprising Lumbriculidae, Branchiobdellida and Hirudinea. Second, all available leech and branchiobdellidan transcriptomes showed appreciably longer branch lengths on preliminary ML trees than did all other clitellates. Sampling only the relatively short-branch Lumbriculus variegatus allows this outgroup clade to be represented while avoiding potential confounding factors due to branch-length heterogeneity.
The crassiclitellate samples represented all extant crassiclitellate families but one (Biwadrilidae) and at least 28 genera. Transcriptomes for thirty-one crassiclitellate taxa and all six outgroup taxa were generated as part of this study, and two additional crassiclitellate transcriptomes were assembled as described below from data in the Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra/) for Hormogaster elisae (PRJNA196484) and Eisenia andrei (DRX021555). A transcriptome was also generated for a representative of Moniligastridae (Drawida sp.). Voucher specimens are deposited at the North Carolina Museum of Natural Sciences (NCSM), the Swedish Museum of Natural History (SMNH) and the Western Australian Museum (WAM) (Table 1).
Total RNA was extracted from RNAlater®-preserved samples using the Ambion RNAqueous®-Micro Total RNA Isolation kit. First-strand cDNA was constructed using the SMART® cDNA Library Construction Kit (Clontech Laboratories, Inc.), replacing the included 3′ primer with the Cap-TRSA-CV oligo . We amplified double-stranded cDNA using the Advantage® 2 PCR Kit (Clontech Laboratories, Inc.). To minimize the risk of contamination, extractions and cDNA construction were performed in small batches of four tissue samples or fewer, and the workstation and tools were cleaned with bleach between each set of extractions. Where possible, we avoided sampling the external body surface and the gut to limit the potential for contamination from epibionts and gut contents (e.g., prey items and microorganisms).
Non-normalized cDNA libraries were sent to Hudson Alpha Institute for Biotechnology, Huntsville, Alabama USA for library preparation and 2 × 100–bp paired-end sequencing on an Illumina HiSeq 2000. Approximately one-sixth of a lane was used for each taxon.
Sequence assembly and processing
Raw PE Illumina reads were digitally normalized using khmer (normalize-by-median.py -C 30 -k 20 -N 4 -× 2.5e9)  and assembled using the October 5, 2012 release of Trinity . We used TransDecoder (http://transdecoder.github.io) to find open reading frames and translate nucleotide sequences into amino acid sequences that were at least 100 amino acids in length.
Translated data for all 40 taxa were searched against the Lophotrochozoa pHMMs in HaMStR v.13.2.3  using Helobdella robusta as reference species. We set HaMStR to output all sequences that fulfilled the reciprocity requirement and then used a custom script to generate FASTA-formatted files for each orthogroup that included all sequences and deleted duplicated contigs. Each orthogroup was then aligned with MAFFT (L-INS-i) .
One of the major difficulties in phylogenomic analysis—particularly when dealing with transcriptome data—is orthology assessment. Most animals harbor paralogous copies of many genes, but standard molecular phylogenetic analyses assume that data sampled from each taxon for each locus are orthologs. Failure to distinguish orthologs from paralogs can cause major problems in phylogenetic inference . Given this, we used a tree-based approach to remove likely paralogs from our alignments. We inferred a maximum-likelihood (ML) tree for each aligned orthogroup with FastTreeMP  (under the –slow and –gamma settings), and used PhyloTreePruner  to screen each of the resulting trees. In PhyloTreePruner, nodes on each ML tree with SH-like local support values <0.7 were collapsed into polytomies, and the largest subtree was retained where each taxon was represented by either no sequences or only one sequence, unless all sequences for a given taxon formed part of a clade or part of the same polytomy (in which case, all were retained). Sequences falling outside this maximally inclusive subtree were assumed to be paralogs and were deleted from the data set. If multiple in-paralogs were initially retained, all but the longest sequence were subsequently deleted by PhyloTreePruner. This returned an alignment for each orthogroup that included (at most) a single, putatively orthologous sequence for each taxon. PhyloTreePruner was used to retain only orthogroups found in at least 25% (10 taxa), 50% (20 taxa), 75% (30 taxa) and 100% (40 taxa) of transcriptomes. All loci were subsequently realigned with MAFFT (L-INS-i). FASconCAT v1.0.pl  was then used to concatenate orthogroups. The ProteinModelSelection.pl script (https://github.com/stamatak/standard-RAxML/blob/master/usefulScripts/ProteinModelSelection.pl) was used to find the best-fitting amino-acid substitution model for each orthogroup (for downstream analyses using TreSpEx; see below) and for each concatenated data matrix. We chose not to use any automated alignment filtering methods (e.g., GBlocks ), due to concerns about their efficacy in improving phylogenetic inference .
Distantly related outgroups may be problematic for phylogenomic inference . We used two approaches to explore the effect of outgroup sampling on estimates of ingroup relationships. First, we deleted Lumbriculus variegatus and Propappus volki (the two most distant outgroups in terms of summed branch length to the base of Crassiclitellata across analyses) and “?Haplotaxidae sp.” (a conspicuously long outgroup branch) from the set of transcriptomes prior to processing with the approach outlined above, leaving a total of 37 taxa. Following the approach outlined above, we used PhyloTreePruner to only retain orthogroups found in at least 25, 50 and 75% of the taxa (in this case, 10, 19 and 28 taxa, respectively). Second, we deleted only “?Haplotaxidae sp.” from the original set of transcriptomes, leaving a total of 39 taxa. For this data set, we processed the transcriptomes as described above, but used PhyloTreePruner to only retain orthogroups found in ≥75% of the taxa (i.e., 30 taxa). To assess the influence of sites with high percentages of gaps/missing data on our inferences, we produced two concatenated “no ?Haplotaxidae sp.” 75% data matrices. For one, we did no additional filtering. For the other, we used TrimAl v1.2  to remove all sites comprising >50% gaps from each individual orthogroup alignment prior to concatenation and model testing. Amounts of missing data per taxon were calculated using TREE-PUZZLE 4.3  for all matrices.
All data matrices, ML tree files, custom scripts and supplementary figures are available via the Dryad Digital Repository (http://datadryad.org/resource/doi:10.5061/dryad.n7n71).
Long-branch effects and compositional heterogeneity
Differences in substitution rates and nucleotide/amino acid composition among lineages constitute two well-known confounding factors in phylogenetic analysis [33,34,35,36]. To assess potential impact of these factors on our inferences, TreSpEx.v1.1  was used to calculate three measures of branch-length heterogeneity—the average patristic distance (PD), the standard deviation of the tip-to-root distance and the LB score (the mean pairwise PD of a taxon to all other taxa in the tree relative to the average pairwise PD over all taxa )—for each locus. Any single-gene alignment that had a value equal to or greater than 1.5 times the interquartile range above the median for any of these three indices was eliminated. Remaining loci were evaluated with BaCoCa v. 1.104r . Data partitions (loci) with a p-value of less than 0.05 for a chi-square test of homogeneity were eliminated, as were all loci that were 1.5 times the interquartile range above the median RCFV value. RCFV measures the absolute deviation from the mean for each amino acid and taxon, in this case summed across taxa for each partition (locus); higher RCFVs indicate a higher degree of compositional heterogeneity in that partition . TreSpEx and BaCoCa filtering was not applied to the 100% data set, which was already quite small in terms of number of loci (Table 2).
Maximum Likelihood (ML) analyses
Partitioned maximum-likelihood (ML) analyses were conducted with RAxML versions 8.1.24 and 8.2.3  on CIPRES  with 1000 rapid bootstrap replicates, using the following options: -f a -x < random number seed for rapid bootstrapping; unique for each analysis > −p < random number seed for initial parsimony inferences; unique for each analysis > −# 1000 -m PROTGAMMA < amino acid model > −s < inputfile> − n < outputfile> (Table 2). Best-fitting amino acid substitution models were inferred for each locus and applied to each locus in RAxML by adding “-q < partitionfile>” to the command listed above. Identical random number seeds for rapid bootstrapping and parsimony inferences were used for the two “no ?Haplotaxidae sp.” 75% matrices (one that was not cleaned with TrimAl and one from which sites with >50% gaps were removed, both filtered with TreSpEx and BaCoCa) to allow a direct comparison of tree topologies for these two matrices.
We used SuperQ v.1.1  to visualize topological conflict among loci for the 25, 50 and 75% unfiltered data sets. SuperQ rescales the partial, unrooted ML gene trees for each data set to produce comparable branch lengths, decomposes the trees into weighted quartet trees and employs the QNet algorithm to produce a split network from the quartet trees. We used the Gurobi optimizer to calculate initial split weights and optimize the weights under the “balanced” objective function. We used SplitsTree v.4.14.4  to visualize the resulting networks.
Bayesian Inference (BI) analyses
Site-heterogeneous Bayesian Inference (BI) analyses of the 25, 50, and 75% data sets and for the two filtered “no ?Haplotaxidae sp.” 75% matrices (one that was not cleaned with TrimAl and one from which sites with >50% gaps were removed) were conducted with PhyloBayes-MPI v1.5a  under the CAT-GTR model with two independent chains and gamma-distributed rates on CIPRES. Analyses were allowed to run for up to 168 h (the CIPRES limit), constant sites were removed, and four categories were used for the discrete gamma distribution. Convergence checks were conducted automatically every 1800 s and analyses were terminated early if after a burn-in of 500 cycles, the minimum effective size exceeded 50, and the “maxdiff” value between chains was less than 0.1. For runs that terminated due to reaching the time limit, convergence of parameter estimates and topologies across chains was assessed by evaluating the basecomp and tracecomp files produced by PhyloBayes and via visual inspection of trace files in Tracer v1.6 .
Tree topologies recovered in our analyses contradicted previous hypotheses regarding the monophyly of Dichogaster (see below). The Shimodaira-Hasegawa and approximately unbiased tests [46, 47] are often used to evaluate particular topological hypotheses (including at least one hypothesis chosen a posteriori), but these tests are actually designed to evaluate whether all topologies in a plausible set of topologies are equally good explanations of the data, rather than to compare specific alternative topologies . Fortunately, the parametric bootstrapping (SOWH) test [48, 49] and Bayesian topology tests  are both appropriate in this context.
We used SOWHAT  to perform SOWH tests to test Dichogaster monophyly. SOWH tests require two ML analyses—an unconstrained analysis and an analysis in which the topology is constrained to match a particular alternative hypothesis. The difference in likelihoods between the trees resulting from each analysis (δ) constitutes the test statistic for the SOWH test. The ML topology and branch lengths from the constrained analysis are then used to simulate a large number of data sets using the model parameter estimates for the constrained ML topology and original data. We provided SOWHAT with a Dichogaster monophyly constraint (forcing monophyly of the three Dichogaster transcriptomes) in Newick format and a reduced data set in which three distant/long-branch outgroup taxa (Propappus volki, Lumbriculus variegatus and ?Haplotaxidae sp.) were removed, retaining only orthogroups found in at least 28 of the transcriptomes, emulating the 75% data set described above. SOWHAT called Seq-Gen 1.3.2  to simulate 100 data sets and RAxML 8.2.8  to infer topologies for each simulated data set in an unconstrained and constrained ML analysis. SOWHAT calculates confidence intervals around a SOWH test p-value after addition of each replicate to determine if the sample size of the test was adequate.
For Bayesian topology tests, we used the posterior sample of trees generated in the PhyloBayes CAT-GTR analysis of the 75% data set to estimate posterior model odds for alternative topological hypotheses, following suggestions by Bergsten et al. . We calculated posterior model odds by dividing the frequency of trees in the post burn-in sample of trees that support one hypothesis (e.g., Dichogaster is not monophyletic) by the frequency of trees that support the alternative hypothesis (e.g., Dichogaster is monophyletic; all three Dichogaster transcriptomes form a clade).
Divergence time estimation
Unfortunately, the dearth of fossils that can be attributed to earthworms [53, 54] presents a challenge for estimating divergence times, but there are some relevant fossils as well as some previous dating studies on earthworms. Putative earthworm trace fossils (burrows or casts) have been recovered from the Triassic , with possible body fossils in the Paleocene . Possible clitellate body fossils have been recovered from Permian deposits , and fossil leech cocoons are known from the late Triassic . Finally, a molecular study of hormogastrid earthworms (calibrated using the separation of the Corso-Sardinian microplate from continental Europe) suggests that they radiated in the Late Cretaceous ; if this is correct, the common ancestor of all crassiclitellates must have arisen much earlier.
These fossils and inferences give us a set of calibration points that we can use to estimate dates for key divergences within our phylogenies. We performed dating analyses for three data matrices: the unfiltered 75% data set (including ?Haplotaxidae sp.) and two versions of the 75% data matrix that did not include ?Haplotaxidae sp. (one with all sites and the other with sites containing >50% gaps removed, both filtered with TreSpEx and BaCoCa as described above) in PhyloBayes 3.3f . In each case, we used the CAT-GTR PhyloBayes majority rule consensus tree for each data matrix as a fixed topology. We ran four independent chains for each data set, sampling every ten cycles, under the CAT-GTR substitution model with gamma-distributed rates, a lognormal autocorrelated relaxed clock model and a uniform prior on divergence times.
We used three calibration points/ranges in our analyses—the oldest known leech cocoon fossil (201 Mya) , the divergence of Hormogastridae (67–97 Mya)  and a minimum age estimate for crown-group Annelida of 520 Mya (based on the earliest known—probably stem-group—polychaetes from the Sirius Passet deposit of North Greenland; [61,62,63,64]). Though we did not include leeches in our analyses, previous studies have supported a sister-group relationship between leeches and their allies (branchiobdellidans and Acanthobdella) and Lumbriculidae [13, 18], providing a minimum age for divergence of the Lumbiculidae + Hirudinea clade and Crassiclitellata based on the earliest fossil cocoons attributable to leeches. We used 67 Mya as a minimum age and 97 Mya as a maximum age for the deepest divergence within Hormogastridae as represented in our data matrices  (the node subtending Hemigastrodrilus monicae and Vignysa popi/Hormogaster elisae; a recent phylogenomic study of Hormogastridae  corroborates this pattern of relationships). Finally, we argue that a minimum age of crown-group Annelida (520 Mya) is suitable as a maximum age constraint for the root of our phylogeny, because no evidence of clitellates is known prior to the Permian, and the root of our phylogeny is deeply nested within Clitellata, which is itself deeply nested within the annelid crown group.
The calibration for the divergence between Lumbriculidae and Hirudinea (201 mya) was treated as a hard upper bound, with the lower bound modeled as a truncated Cauchy distribution (p = 0.1 and c = 1). We placed uniform priors of 67–97 mya and 201–520 mya on the Hormogastridae divergence and the root node, respectively. Convergence was assessed with estimated sample sizes and visual inspection of parameter traces in Tracer v1.6. To assess whether the priors conditional on our calibrations match our intended prior distributions, we ran PhyloBayes under the prior and our calibrations using the F81 model without rate variation across sites (these model parameters do not factor into the prior over divergence times) and visually inspected the results.
We focused on divergence times for two nodes in our phylogeny that separated Northern and Southern Hemisphere subclades—1) a node separating Kynotus pittarelli (Madagascar) and a clade comprising Sparganophilus sp. and Komarekiona eatoni (both found in eastern North America) and 2) a node separating a Northern Hemisphere clade comprising Lutodrilus (North America) and Lumbricoidea (Criodrilidae, Hormogastridae, Lumbricidae) (Europa and Asia) and a primarily Southern Hemisphere clade comprising representatives of Almidae, Acanthodrilidae, Eudrilidae, Glossoscolecidae, Megascolecidae, Microchaetidae and Ocnerodrilidae (Africa, Australia, New Zealand and South America). We hypothesized that these divergences may be due to vicariance during the breakup of Pangaea starting in the late Triassic to early Jurassic (~200–185 Mya) [66, 67]; divergence time estimation using molecular data allows a test of this hypothesis.
Ideally, we would also infer dates using a Bayesian method such as BEAST , but preliminary analyses suggested that the computational demands of inferring divergence times for our data in this manner would be prohibitive.