Phylogenetic reconstruction and calibration
To infer phylogeny for Eleotridae and Apogonidae, I assembled previously published DNA sequence data and used Bayesian methods for both phylogenetic analysis and time-calibration with fossil-based legacy dates. For Eleotridae I used the matrix of [9], including Rhyacichthyidae, Odontobutidae, Milyeringidae, Butidae, and Eleotridae. The dataset includes sequence from the mitochondrial genes cytb, COI, ND1, and ND2, totaling 4397 base pairs for 78 taxa. That phylogeny [9] inferred that there are three clades of New World taxa within Eleotridae, of which two are sisters, consistent with two separate invasions of New World waters. Resolution among the major clades within the family was weakly supported. For this study I have included a wide range of gobiiform taxa, both to provide phylogenetic context for the New World taxa as well as to permit assignment of fossil calibrations to the tree. I also provide a Bayesian analysis of the [9] dataset, which was originally analyzed with parsimony. Eleotrid genera containing or entirely consisting of New World species are Dormitator, Eleotris, Gobiomorus, Guavina, Hemieleotris, Leptophilypnus and Microphilypnus. I use the genus name Eleotris for E. armiger and E. smaragdus, species often placed in the genus Erotelis, in line with the phylogenetic evidence indicating that Erotelis is nested within Eleotris [9, 12, 13]. Of the New World eleotrid genera, all except for Microphilypnus (known from Venezuela and Brazil) have representatives on both sides of the Isthmus of Panama. For this study, all genera are present in the phylogeny, but representatives of transisthmian geminate pairs were only available for Dormitator, Eleotris, Gobiomorus, and Leptophilypnus. Sequences used for phylogenetic analysis of Eleotridae are given in Additional file 1: Table S1.
For Apogonidae, I analyzed a subset of the matrix of [10], including both mitochondrial (COI) and nuclear (ENC1, RAG1) genes, for a total of 3600 aligned base pairs for 26 taxa. Previous phylogenies of Apogonidae have varied in the depth of their sampling, but all agree in grouping New World and Mediterranean Apogon with Astrapogon, Paroncheilus, Phaeoptyx, and Zapogon, as part of a fairly deep split in the apogonid tree [10, 11, 14]. For this analysis, I included all the Apogon, Astrapogon, Paroncheilus, Phaeoptyx, and Zapogon sequenced by [10], as well as Gymnapogon vanderbilti and Pseudamia gelatinosa, used as outgroups and to provide deep enough nodes for calibrating the hypothesis. Sequences used for phylogenetic analysis of Apogonidae are listed in Additional file 1: Table S2.
I assembled the matrices using Geneious version 6.1.8 and Mesquite version 3.0.4, and aligned the protein coding genes based on the translated amino acid sequence. I then used MrBayes, version 2.0.9 to infer the Bayesian phylogeny (implemented in Geneious). For that analysis, I specified models independently for each gene partition. I applied a HKY + G + I substitution model for the COI partition and a GTR + G + I model, independently, for each of the cytb, ND1, and ND2 gene partitions in Eleotridae. Similarly, I used the GTR + G + I model for each of the COI and RAG1 gene partitions and SYM + G + I for the ENC1 gene partition in Apogonidae (as chosen by the R module phangorn [15]). I ran the analysis for 10.0 × 107 generations, with four simultaneous chains, sampling every 1000 replications, and discarding the first 10% of trees as burn-in. I constructed a 50% majority-rule consensus phylogeny of the remaining trees, then calibrated that phylogeny with Beast 1.7.5 [16], run with an uncorrelated lognormal relaxed clock model and a birth/death speciation prior.
To assign calibrations to the hypotheses, I first used dates inferred based on fossil calibrations in previous, more deeply sampled phylogenies. The use of such secondary calibration points has been questioned, particularly when they are assigned without error [17, 18], and one study showed that assignment of secondary calibrations results in small (1–2 Ma) but significant differences among subsamples of simulated phylogenies [19]. For these data, another possible method of calibration is to assign the node subtending the least divergent transisthmian lineages a biogeographic calibration, based on the date of the closure of the Isthmus of Panama. The use of biogeographic calibrations has also been criticized [20,21,22], in particular the assumptions that the date of a biogeographic event is reliably known and that no extinction has occurred since then that could result in the calibration being applied erroneously. Both calibration methods have their drawbacks, but because it is instructive to compare them, I performed a second calibration analysis for each family, with the shallowest (or only) transisthmian lineage in each group assigned a minimum divergence of 3.1 Ma.
For the legacy calibrated analyses, I applied calibrations for three nodes in Eleotridae: Butidae (47 Ma), Eleotridae (46 Ma), and the clade containing (Odontobutidae, Milyeringidae, Butidae, Eleotridae: 65 Ma). These calibrations were derived from the larger-scale analysis of Gobiiformes in [23], and I applied them as normal priors, with standard deviations of 10 Ma, more than encompassing the 95% highest posterior densities of the calibration estimates in the original analysis. For Apogonidae, I dated the root of the family based on the calibrated gobiiform phylogeny of [23], at 51 Ma, also applied as a normal prior with a conservative 10 Ma standard deviation. The Bayesian search ran for 10.0 × 108 generations (Eleotridae) or 10.0 × 109 generations (Apogonidae), with trees sampled every 1000 or 10,000 generations, respectively, and resulting in a pool of 100,000 trees. At the end of the analyses, estimated effective sample sizes (ESS) for all parameters exceeded 200 for Eleotridae, and all save the overall posterior and prior for Apogonidae (ESS of 171 and 169, respectively), likely due to weakly supported resolution along part of the phylogenetic backbone. I confirmed that 10% was the appropriate burn-in fraction with Tracer 1.5, then constructed a maximum clade credibility consensus of the post burn-in trees using TreeAnnotator 1.7.5 [16], and visualized this tree using FigTree 1.3.1 [24]. For the isthmus calibrated analyses, I assigned a lognormal prior for the youngest transisthmian split in Eleotridae, between Dormitator species, and for the origin of the Eastern Pacific clade in Apogonidae. For each calibration, I specified an offset of 3.1 (representing the hard minimum of 3.1 Ma for transisthmian divergences), and standard deviation of 1.0. I ran and sampled these analyses as for the legacy calibrated hypotheses, and in both cases, ESS values well exceeded 200. I also constructed and visualized the consensus hypothesis as described above for the legacy calibrated hypotheses.
Morphometric data and analyses
To quantify phenotypic variation between geminate species, I examined preserved specimens of Eleotridae in each of the six species comprising the three geminate pairs in Eleotris, Dormitator, and Gobiomorus. Leptophilypnus, the other geminate pair present in the phylogenetic analysis, was not used for shape comparisons because adults of Leptophilypnus are too small (maximum 64 mm SL, most much smaller [25]) to be reliably photographed and landmarked. For Apogonidae, I examined representatives of the entire New World clade, including Caribbean species, the Mediterranean singleton Apogon imberbis, and the clade of Eastern Pacific taxa. Species and specimens examined are listed in Additional file 1: Table S3; individuals were initially fixed in formalin and preserved in 70% ethanol. I examined 75 individuals of Eleotridae (range of six to 16 individuals for each of the six species) and 157 individuals of Apogonidae (range of two to 11 individuals for each of 25 species, including several not present in the molecular phylogeny). In every case, I selected unbent, intact adults. For each individual, I photographed the specimen using a Panasonic Lumix DMC-ZS3 digital camera mounted on a copystand. I then digitized a suite of 17 landmarks from the left side of each individual (Fig. 1), using ImageJ version 1.49 t [26]. These landmarks primarily describe the shape of the body, including the placements of the median fins, but also include the dimensions of the mouth and eyes, the anterior extent of the opercular opening, and the positions of the pelvic and pectoral fins. All landmarks were assigned in all specimens.
To determine whether or not Caribbean and Eastern Pacific taxa among Apogonidae and Eleotridae are separated in shape space, I used MorphoJ version 1.05d [27]. I forwarded landmark coordinates to MorphoJ, where I performed a Procrustes fit, generated a covariance matrix, and used that matrix as data for principal components analysis (PCA) and canonical variates analysis (CVA). For the CVA, I grouped the individuals by geographic range. I also used MorphoJ to generate plots of the PCA for both sleepers and cardinalfishes. To evaluate whether or not the shape differences among transisthmian taxa are significant, I performed Procrustes PCA and MANOVA on the morphometric landmark data with range as a factor, using the R (version 3.2.1) package geomorph (version 2.1.5 [28]).
Phylogenetic comparative analyses
The transisthmian divergences among Eleotridae are all phylogenetically independent. In contrast, among New World Apogonidae the Eastern Pacific radiation is nested inside a Caribbean clade, along with a single Mediterranean species. With this pattern, comparisons among species inhabiting different ranges are not independent, and so I used phylogenetic PCA and MANOVA to correct for shared evolutionary history. For these analyses, I used a reduced subset of the cardinalfish morphometric data that only included species present in the phylogenetic tree (17 species overlapped between the morphometric and phylogenetic analyses). Using that phylogeny, I first tested for phylogenetic signal in the cardinalfish morphometric data with geomorph. I then performed a phylogenetic MANOVA two ways. First, using the uncorrected PC scores for the reduced taxon set, I executed a phylogenetic MANOVA of shape change by range with the geiger package (version 2.0.6 [29]), and assessed significance of range as a factor using the Wilks test. Then, I generated a phylogenetic PCA of the shape coordinates for the reduced taxon set with the phytools package (version 0.5–20 [30]). Finally, I performed a phylogenetic MANOVA of the phylogenetically corrected PC scores, again using geiger.
To gauge whether or not the rate or mode of phenotypic change in Apogonidae shifted across the phylogeny when lineages were sundered by closure of the Isthmus of Panama, I used the R package mvMORPH [31]. I reconstructed the evolution of geographic range (Caribbean, Eastern Pacific, or Mediterranean) on the calibrated phylogeny, as well as scores for the first three PC axes, and then fitted multivariate models of trait evolution to the reconstructions. I first calculated the fit of single and multiple rate Brownian Motion (BM; trait variance increases over time, without constraint) and Ornstein-Uhlbeck (OU; trait variance is constrained about a mean, emulating selection) models across the phylogeny. Then, I imposed models incorporating mode shifts at the closure of the Isthmus of Panama (separation of the Eastern Pacific clade from the Caribbean radiation). I applied six mode shift models, the first four incorporating shifts between BM and OU modes: ecological release (shift from OU -> BM process with same rate); release and radiate (shift from OU -> BM process with variable rates); ecological constraint (shift from BM -> OU process with same rate); and radiation and ecological constraint (shift from BM -> OU process with variable rates [32]). These models emulate scenarios in which character evolution becomes more (ecological constraint) or less (ecological release) constrained following the splitting of populations by the closure of the Isthmus of Panama and associated environmental changes.
I also tested two models involving mode switches between BM and EB (early burst, a modification of the accelerating/decelerating rates [ACDC] model of character change in which the rate of evolution slows exponentially, emulating an adaptive radiation [33, 34]). These models are denoted Brownian motion/early burst (BMEB) and early burst/Brownian motion (EBBM) in [31], and in this context would describe adaptive radiations taking place before or after the populations were split by the Isthmus of Panama. Finally, I evaluated AIC and AICc score values for each of the ten evolutionary models to assess the fits, and compared support among models using Akaike weights computed from the AICc scores [35]. I performed each of these analyses on both the legacy calibrated and isthmus calibrated phylogenies, to compare the effects of different calibration regimes.