Introduction

Retrotransposons (RT) are a type of transposable element (TE) that moves through the genome via an RNA intermediate in a process that resembles “copy and paste” (Wicker et al. 2007). Retrotransposons can be separated into two major subclasses, long terminal repeat retrotransposons (LTR-RTs) and non-LTR retrotransposons, based on their structure and transposition mechanism (Todorovska 2007). A typical LTR-RT contains two highly similar long terminal repeats (LTRs), a primer-binding site (PBS), a polypurine tract (PPT), and two genes necessary for their retrotransposition, gag and pol (Du et al. 2010). The majority of LTR-RTs can be divided further into Copia and Gypsy superfamilies according to the order of proteinase (PR), integrase (IN), reverse transcriptase (RT), and RNase H (RH) domains in Pol (Domingues et al. 2012). The domains of Gypsy elements are arranged as LTR-GAG-PR-RT-RH-IN-LTR, whereas the Copia elements are organized as LTR-GAG-PR-IN-RT-RH-LTR (Wicker et al. 2007). Gypsy and Copia superfamilies can be further classified into lineages and families with phylogenetic analysis of protein domain sequences that are usually supported by differences in structure (such as size of LTRs and elements) (Wicker et al. 2007). In land plants, four Copia (Retrofit, Tork, Sire, and Oryco) and six Gypsy (Athila, Tat, Renia, CRM, Galadriel, and Del) lineages were reported (Llorens et al. 2011).

LTR-RTs can be used as a molecular marker system because of their high copy number, widespread distribution, and high heterogeneity (Kumar and Hirochika 2001). Several types of LTR-RT molecular markers have been developed, such as inter-retrotransposon amplified polymorphism (IRAP), sequence-specific amplified polymorphism (SSAP), retrotransposon-microsatellite amplified polymorphism (REMAP), insertion site-based polymorphism (ISBP), and retrotransposon-based insertion polymorphism (RBIP) (Flavell et al. 1998; Kalendar et al. 1999; Paux et al. 2010; Waugh et al. 1997). Most notably, IRAP markers amplify the intervening region between two retrotransposons to show polymorphisms (Kalendar et al. 2011; Kalendar et al. 1999). IRAP is used frequently because of the easy development of one outward-facing LTR-derived primer and generation of marker bands without digestion and ligation. SSAP exploits LTR-RT polymorphisms by amplifying the region between a retrotransposon and adjacent restriction site in the genome creating additional polymorphisms that can be used to differentiate closely related accessions (Syed et al. 2005). SSAP is especially useful for clone identification (Bretó et al. 2001; Venturi et al. 2006; Zhao et al. 2010). LTR-RT markers have been used widely for pedigree analysis, population structure, fingerprinting, linkage, and genetic mapping in several plant species (Branco et al. 2007; Farouji et al. 2015; Huo et al. 2009; Jia et al. 2009; Kalendar et al. 2011; Mandoulakani et al. 2015; Queen et al. 2004; Smykal 2006; Sun et al. 2015). The development of next-generation sequencing technologies has allowed for huge numbers of retrotransposon sequences to be generated, providing new opportunities for molecular marker development (Barghini et al. 2014; Cossu et al. 2012; Du et al. 2010; Xu and Du 2013; Zhang et al. 2012).

Many types of molecular markers have been used to characterize the phylogenetic relationships of Citrus accessions and relatives. Previously used markers include random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR), and sequence-related amplified polymorphism (Barkley et al. 2006; Federici et al. 1998; Uzun et al. 2009; Yamamoto et al. 1993). These marker types could not fully reveal the origin and taxonomy of citrus. There are few studies that use LTR-RT based makers in citrus (Asins et al. 1999; Bernet and Asins 2003; Biswas et al. 2010a; Biswas et al. 2010b; De Felice et al. 2009; Rico-Cabanas and Martinez-Izquierdo 2007). However, small numbers of LTR-RT based primers were developed in these studies. In this study, we identified and characterized full-length LTR-RTs in the C. x clementina genome for use as molecular markers. The LTR-RT markers were tested for transferability among and differentiation between Rutaceae species. The fluorescence-labeled SSAP system used in this study yields a high-throughput and highly efficient mutant identification system for Citrus. To our knowledge, this is the first detailed report on high-throughput fluorescent SSAP in Citrus.

Material and methods

Genomic sequences and plant materials

The genomic sequences of C. x clementina (v1.0) and C. x sinensis were obtained from Phytozome v10 (http://phytozome.jgi.doe.gov) (Wu et al. 2014) and the orange genome annotation project (Xu et al. 2013), respectively. Twenty-five accessions fromthe genus Citrus (Table 1) and related genera in Rutaceae family (Table 2) were used for IRAP analysis. Young leaves were collected at the Florida Citrus Arboretum (Winter Haven, Florida) and stored at − 80 °C until used. For SSAP analysis, 27 sweet orange accessions (C. x sinensis Osb.) were used, including ‘Valencia’, ‘Pineapple’, and one irradiated bud mutant of ‘Valencia’ or ‘Pineapple’ (denoted as OR); four bud-derived clones generated from irradiated OR (denoted as‘B-’ followed by a number that indicates a different bud mutant); and 20 tissue culture-derived somaclones of OR (denoted as ‘OLL-’ followed by a number that indicates a unique somaclone). Each OLL somaclone was an independent regeneration event from an embryogenic callus line of the OR sweet orange.

Table 1 Citrus species and accessions used for IRAP analysis
Table 2 Rutaceae Species (excluding Citrus) used in this study

LTR-RT identification

Considering that the primary objective of this study was to isolate LTR-RTs for molecular marker development, full-length LTR-RTs were sufficient for this study. Solo LTR-RTs and LTR-RT remnants containing only one complete or fragmented LTR were not included, because they were considered to be derived from the recombination of full-length LTR-RTs (Ma and Bennetzen 2004). Full-length LTR-RTs were limited to those that contain a pair of relatively identical LTRs at both ends and at least one typical LTR-RT feature (PPT, PBS, and target site duplications (TSD)). LTR-RTs containing two LTRs and all three features were defined as intact LTR-RTs (Cossu et al. 2012).

LTR_FINDER and RepeatMasker v-4.0.3 were used to identify the full-length LTR-RTs from C. x clementina (Tempel 2012; Xu and Wang 2007). LTR sequence lengths from 100 to 5000 bp and a maximum distance between LTRs of 25,000 bp were parameters used for LTR_FINDER. RepeatMasker v-4.0.3 was used to find loci with high similarity to TEs in the Repbase Viridiplantae repeat library (Jurka et al. 2005). Crossmatch was used as search engine with a Smith-Waterman cutoff of 225. A perl tool, “one code to find them all” was used to annotate RepeatMaster output (Bailly-Bechet et al. 2014). Sequences that aligned with the query that had greater than 80 bp, 80% of their respective length, and 80% identity were used for downstream analysis. Redundant sequences and those containing N symbols were discarded. The boundaries and structures of all LTR-RTs were manually confirmed. To check the accuracy of the predicted LTR-RT, 46,339 BAC end sequences were downloaded from NCBI (Terol et al. 2008) and aligned to the identified LTR-RTs with BLASTN using default parameters. Sequences that aligned with the query with more than 99% of their respective length, and more than 99% identity, were considered a match to identified LTR-RTs.

LTR-RT classification

The identified LTR-RTs were classified according to the methods previously described by Wicker et al. (Wicker et al. 2007). All putative LTRs were clustered into families using the USEARCH program with a similarity cutoff of 80% (Edgar 2010). The sequences between two putative LTRs were analyzed with BLASTN and BLASTX against Repbase (Camacho et al. 2009). Coding domains were located using hmmsearch with sequence threshold (-E): 10 and domain threshold (-domE): 10 (Bateman et al. 2004). The following pfam profiles were used: PF03732 and PF00098 for GAG; PF07727, PF05380 and PF00078 for RT; PF03078 for envelope protein (ENV); PF02022, PF00665 and PF00552 for IN; PF00077 and PF00026 for PR; PF00075 for RH (Wang and Liu 2008).

Phylogenetic analysis

The RT domains of 49 reference elements representing known Viridiplantae LTR-RT lineages were downloaded from the Gypsy Database (GyDB) (Llorens et al. 2011). Representative RT protein sequences of each citrus LTR-RT family (for 350 Copia and 178 Gypsy families) were randomly selected and aligned with the RT domains mentioned above using default parameters in Clustal Omega (Sievers et al. 2011). Two neighbor-joining phylogenetic trees were constructed separately for the Gypsy and Copia superfamilies using MEGA 5.05 with a p distance model and 100 bootstrap iterations (Tamura et al. 2007).

Estimation of insertion time

The two LTRs of a LTR-RT are considered to be identical at the time of insertion. Therefore, the insertion time of intact LTR-RTs can be estimated from the sequence divergence of 5′ and 3′ LTRs (Ma and Jackson 2006). Two LTRs were aligned first using Clustal Omega, and the Jukes-Cantor distance (k) was calculated using the PHYLIP program dnadist (Retief 2000). The insertion time (T) of an intact LTR-RT was calculated using the formula: T = k/2r. A substitution rate of 2.4 × 10−9 mutations per site per year, 2-fold higher than determined for the genes in poplar, was used in this study (Ma and Jackson 2006; Tuskan et al. 2006).

IRAP primers design

To validate in silico analysis and to test the transferability of Clementine LTR-RT based markers, 17 IRAP primers were designed following the protocols published by Schulman AH (Kalendar et al. 1999; Kalendar and Schulman 2006). Twenty-five LTR-RT families including at least two members were randomly selected, and their homologs in sweet orange were identified using BLASTN (sequence identify > 90% and length > 90% of query LTRs). The LTRs within one family, along with their homologs from sweet orange, were aligned using Clustal Omega. At least one conserved region for 17 LTR-RT families was identified and used for primer design with Primer Premier 5.0 (Singh et al. 1998). Primer sequences are listed in Online Resource 1.

DNA extraction and IRAP analysis

Total genomic DNA was extracted from leaves using the CTAB method (Doyle 1991). DNA concentration was measured with a Nanodrop spectrophotometer (Biotek Instruments, Winooski, VT). IRAP analysis was done as described by Kalendar et al. (1999). Amplification reactions were done in a 20 μl solution containing 0.25 mM dNTPs, 0.4 μM primer, 100 ng genomic DNA, 2.0 mM MgCl2, 1 × PCR buffer, and 1 U Taq DNA polymerase (Promega, USA). The amplifications were carried out in a Bio-Rad thermal cycler using the following amplification profile: 1 cycle at 95 °C, 5 min; 30 cycles at 95 °C, 1 min; 55 °C 1 min; ramp +0.5 °C s−1 to 72 °C; 72 °C, 2 min + 5 s per cycle; 1 cycle at 72 °C, 8 min. PCR products were separated on a 2% agarose gel at 8 V/cm using 0.5X Tris-borate-EDTA buffer, stained with ethidium bromide, and photographed using SYNGENE Automated Gel Documentation System (Cambridge, USA). At least two PCR amplifications were conducted for each sample and only reproducible bands between 200 and 2000 bp were used for downstream analysis.

SSAP analysis

Three SSAP forward primers were designed as mentioned above. Restriction, ligation, and pre-amplification reactions were done as described by Waugh et al. (1997). Selective amplification was conducted with a retrotransposon primer in combination with either Mse I + 3 or EcoR I + 3 (Online Resource 1). A total of 48 primer combinations were used (3 SSAP primers * 2 enzymes * 8 selective bases). SSAP analysis was repeated two times for each primer pair. Following selective amplification, 0.5 μl of PCR amplicons was added to a mixture with 9.25 μl Hi-Di formamide and 0.2 μl GenScan 500 LIZ molecular weight markers (Applied Biosystems, Foster City, CA) for 3 min at 94 °C and immediately placed on ice for 6 min. Samples were fluorescently labeled with an ABI PRISM 3130 xl Genetic Analyzer (Applied Biosystems, Foster City, CA) as follows: POP-7™ polymer at 63 °C, sample injection voltage was 1.6 kV with 12 s injection time, and 10 kV run voltage for 7200 s.

Data analysis

Raw fluorescent SSAP data were analyzed and visualized using Genemarker v.4.0 software (SoftGenetics LLC®, State College, PA). A minimum fluorescence threshold value of 250 was chosen. Peaks between 60 and 500 bp were included in the analysis. The bin table output of peak areas called was transferred to an MS® Excel spreadsheet. The peak areas were converted to 0 and 1 scores indicating peak/marker absence and presence. For IRAP analysis, binary matrices (presence/absence) were prepared from electrophoretic patterns. The simple matching (SM) similarity coefficient was calculated with the SIMQUAL module. Dendrograms were built by cluster analysis using the unweighted pair-group method with arithmetic averages (UPGMA) and the SAHN clustering program. The FIND module was used to identify all trees that resulted from different choices of tied similarity or dissimilarity values. The clustering goodness-of-fit to the data matrix was calculated by the programs COPH and MXCOMP. Figures were generated with the PROJECTIONS module. All analyses were performed with the software NTSYS-pc 2.10e (Rohlf 1992).

Results

Identification of full-length LTR-RTs

A total of 3959 full-length LTR-RTs were identified in the Clementine genome (Online Resources 2 and 3). LTR_FINDER and RepeatMasker identified 3791 and 1099 full-length LTR-RTs (931 in common), respectively. Intact LTR-RTs containing two LTRs and the three features (PPT, PBS, and TSD) comprised 40.7% (1612) of total LTR-RTs. Most of LTR-RTs in this study (3836) had at least two of the three features. Most full-length LTR-RTs (3593) were terminated by the highly conserved TG-CA boxes at both 5′ and 3′ ends of LTRs. The mean length of full-length LTR-RTs was 8.08 kb, with a standard deviation of 4.50 kb. For LTRs, the mean length was 781.69 bp, and the standard deviation was 572.57 bp. The accuracy of identified LTR-RTs was confirmed with the analysis of 28.6 Mb BAC end sequences, which corresponds to 8% of the Clementine genome (Terol et al. 2008). About half of full-length LTR-RTs (1738) matched at least one BAC end sequence. The total length of matched region was 3.5 Mb, corresponding to 11% of the identified LTR-RTs.

Classification of full-length LTR-RTs

The 3959 full-length LTR-RTs were first classified into two superfamilies (Gypsy, Copia, or unknown) according to their protein domain organization. As shown in Table 3, 1285 and 1727 LTR-RTs were included in Copia and Gypsy superfamilies, respectively. Then, the full-length LTR-RTs were clustered into 1074 families based on LTR sequence similarity, including 386 Copia and 214 Gypsy families. The mean number of full-length LTR-RTs per family was 3.69, and the largest family had 282 LTR-RTs. The average size of the Gypsy families (8.07) was approximately two times higher than the Copia families (3.33).

Table 3 General features of C. x clementina LTR-RT lineages

To understand the evolutionary relationships among LTR-RT families, two phylogenetic trees were constructed separately for Copia and Gypsy superfamilies (Figs. 1 and 2). Three hundred fifty Copia families were grouped into four previously defined lineages: Retrofit, Tork, Sire, and Oryco. The Sire lineage and Oryco lineages were first clustered together, then grouped with Tork and Retrofit lineages. One hundred seventy-eight Gypsy families fell into six lineages: Athila, Tat, Renia, CRM, Galadriel, and Del. The four lineages-Renia, CRM, Galadriel, and Del of Branch 1, also called chromoviruses, were first clustered together, then grouped with Athila and Tat, two lineages of Branch 2. Family numbers of Retrofit and Tork lineages were almost equally represented within the Copia superfamily (Table 3). The Athila was the largest lineage and contained 814 LTR-RTs and was the most highly represented in Gypsy superfamily (Table 3).

Fig. 1
figure 1

Neighbor-joining phylogeny of Copia families based on reverse transcriptase. One representative element that contains a complete RT domain was chosen for each of the 350 Copia families. Reference sequences from GyDB are denoted with a plus symbol and shown in red. Bootstrap values below 60% are not shown

Fig. 2
figure 2

Neighbor-joining phylogeny of Gypsy families based on reverse transcriptase. One representative element that contains a complete RT domain was chosen for each of the 178 Gypsy families. Reference sequences from GyDB are denoted with a plus symbol and shown in red. Bootstrap values below 60% are not shown

The size distributions of LTR-RTs and LTR length varied among lineages (Table 3).The average LTR-RT length of Sire lineage within the Copia superfamily was greater than that of other lineages. The average LTR length in the Sire lineage was over three times larger than LTRs in the Oryco and Retrofit lineages. The LTR-RT sizes of Athila, Del, and Tat in the Gypsy superfamily were larger than the other three lineages. Del lineage was found to have the largest LTR length variation.

Distribution of full-length LTR-RTs on citrus chromosomes

Of the 3959 full-length LTR-RTs identified in this study, 3218 (81.3%) were anchored to nine Clementine linkage groups (Fig. 3). These LTR-RTs occupied 28.1 Mb of genome sequence, accounting for 9.74% of the nine currently assembled chromosomes (288.6 Mb) (Online Resource 4). LTR-RTs were distributed throughout the genome and there was little variation in full-length LTR-RT density between the nine chromosomes. LTR-RTs, especially in the Gypsy superfamily, were more abundant in putative centromeric regions.

Fig. 3
figure 3

Distribution of full-length Gypsy, Copia, and unknown LTR-RTs in the 9 linkage groups of C. x clementina. The putative positions of centromeres are based on the results of Wu et al., and shown by black boxes

Putative insertion time of intact LTR-RTs

The insertion time distribution of LTR-RTs fits an exponential decay curve (r square = 0.96, p < 0.01) (Fig. 4a). This pattern was expected, because intact LTRs were rapidly changed to solo LTRs, truncated LTRs, or completely eliminated from the genome. The half-life of intact LTR-RTs in citrus was estimated to be 3.47 Myr. Most citrus LTR-RTs (73.6%) were amplified in the last 10 Myr, and 519 LTR-RTs were inserted within last 2.5 Myr. Significant “peaks” representing the insertion time of different LTR-RTs families showed that these families were active over a short period of time, especially for recently amplified families (Fig. 4b). Almost all LTR-RTs in family 271 were inserted within the last 2.5 Myr. Four of the most recently inserted 271 family members (12.5%) were found to have two identical LTRs, which suggests there was little time to accumulate evolutionary mutations (Online Resource 3).

Fig. 4
figure 4

Insertion time of intact LTR-RTs in C. x clementina. a Insertion time of intact LTR-RTs (mya, million years ago). b Insertion time of intact LTR-RTs within four selected families

Polymorphisms of 25 Rutaceae accessions

All IRAP primers achieved successful amplification across the 25 Rutaceae accessions. In total, 209 reproducible and unambiguous bands were produced, ranging from 250 to 2000 bp in size. Most of these bands (205, 98.09%) were polymorphic. Six to 18 fragments were amplified from a single primer (Table 4). Representative patterns of four LTR primers are shown in Online Resource 5. The genetic similarity coefficients between the 25 accessions were calculated. The average similarity coefficients for the 25 accessions ranged from 0.493 to 0.997 with a mean of 0.650.

Table 4 Summary of PCR amplification corresponding to individual primer in IRAP and SSAP analysis

Using 0.659 as a threshold, the phylogenetic tree could be split into two major clusters (Fig. 5). Cluster 1 included all of accessions in Citrus except C. hystrix, indicating a greater genetic distance from the other Citrus species analyzed. All the other genera were included in Cluster 2. The two clusters could be further divided into four and two groups, respectively. Within Cluster 1, accessions from the same species were grouped together. C. ichangensis and C. maxima alone formed two single clusters: Cluster 1–1 and Cluster 1–2. C. reticulata, C. sinensis, C. paradisi, and C. aurantium were grouped in Cluster 1–3. C. medica, C. limon, and C. aurantifolia shared Cluster 1–4. In Cluster 2, four of the six Citrus-related genera examined (Eremocitrus, Microcitrus, Afraegle, and Zanthoxylum) were clustered together and formed an independent group (Cluster 2–2), which is in accordance with their remarkable phenotypic differences from Citrus. The other two genera, Poncirus and Fortunella, were grouped into Cluster 2–1. Comparatively speaking, Cluster 2–1 displayed a much closer relationship with Citrus genus than Cluster 2–2. Therefore, if 0.587 was used as a cut-off value for defining the clusters, Cluster 2–1 and Cluster 1 would be classified in the same group, clearly separated from Cluster 2–2. The correlation between the similarity coefficient matrix and the cophenetic matrix derived from the UPGMA tree was 0.94, corresponding to a good fit.

Fig. 5
figure 5

Dendrogram of 25 accessions by the UPGMA cluster analysis based on IRAP analysis

Polymorphisms of 27 sweet orange accessions

For SSAP analysis, 24 out of 48 primer combinations generated easily readable patterns that were selected for the downstream analyses (Table 4). A single primer combination produced 17 to 198 bands. A total of 2156 amplification products were generated, of which 1518 (70%) were polymorphic. The primer combinations exhibited different levels of polymorphism ranging from 23.86 to 98%. Genetic similarity coefficients among 27 accessions were calculated using the SSAP analysis data. The distribution of genetic similarity coefficients between ‘Pineapple’, ‘Valencia’, and 24 clones (including four ‘B-’ accessions, and 20 ‘OLL-’) were compared in Fig. 6. The 24 clones had a closer relationship to “Valencia” than “Pineapple” (p < 0.01, Student’s t test). The average genetic similarity coefficients among the 24 clones was 0.796, implying profound levels of genetic differentiation within these clones.

Fig. 6
figure 6

The distribution of genetic similarity coefficients between ‘Valencia’ or ‘Pineapple’ and 24 clones based on SSAP analysis

Discussion

Different lineage classification systems were used for LTR-RTs in different studies. We chose to use the classification system from GyDB, a research project focused on the phylogenetic classification of transposable elements (Llorens et al. 2011). Four and six lineages were reported in Copia and Gypsy superfamilies of land plants, respectively. All these lineages were found in the C. x clementina genome, but the full-length LTR-RT numbers within each lineage varied greatly. The two smallest lineages, Del and Oryco, were found in the C. x clementina genome, containing 41 and 57 full-length LTR-RTs, respectively. However, the largest lineage, Athila, included 814 full-length LTR-RTs and accounted for 47.13% of the Gypsy elements identified. Different species have distinct LTR-RT compositions. A pattern similar to citrus was found in Arabidopsis, where 59.3% of Gypsy elements belonged to the Athila lineage (Du et al. 2010; Marco and Marin 2008). However, only 0.1% of the Gypsy elements identified in rice were classified into the Athila lineage. The largest lineage in rice was Tat, accounting for 55.8% of its Gypsy elements (Du et al. 2010). The ratio of Gypsy to Copia elements in Clementine is 1.34:1, and is much lower than rice (4.9:1) (Tian et al. 2009) and sorghum (3.7:1) (Paterson et al. 2009), but is similar to soybean (1.4:1) (Du et al. 2010) and maize (1.6:1) (Baucom et al. 2009).

Previous studies show that LTR-RT based primers can be used for closely related genera (Kalendar et al. 2011; Kalendar et al. 1999). In this study, all designed IRAP primers can be transferred across the 25 Rutaceae species. The transferability of pear IRAP to other Rosaceae species ranged from 87.5 to 100% (Sun et al. 2015). The transferability of IRAP markers is usually higher than other makers. For SSR markers, approximately half of the primers developed in sweet orange can be used in pummelo or lemon (Biswas et al. 2014). Similar results were reported in the Poaceae and Rosaceae species (Mamaghani et al. 2015; Sun et al. 2015). One reason for the high transferability of IRAP markers in this study may be that the primers were designed in conserved regions within LTR-RT families, not only in conserved regions of orthologous LTR-RTs. We selected LTR-RT families of different sizes for primer design contrary to an IRAP study in pear, where the largest sized LTR-RT families were selected for primer design (Sun et al. 2015). The largest sized full-length LTR-RT families are not representative because they do not include homologous or clustered LTR-RTs in the genome.

According to the classification of Swingle and Reece (Swingle and Webber 1943), the “true citrus fruit trees” group (Citrinae) is divided into six genera: Citrus, Fortunella, Poncirus, Clymenia, Eremocitrus, and Microcitrus. Citrus is the most economically important, and is sexually compatible with all other genera. The UPGMA cluster analysis showed that Fortunella and Poncirus were closer to Citrus than were Eremocitrus and Microcitrus. This result was consistent with the findings by Pang et al. based on AFLP markers (Pang et al. 2007), but differ from the results obtained by Garcia-Lor et al. (2013) and Wu et al. (2018). The Citrus genus was divided into two subgenera Citrus and Papeda, according to the classification system of Swingle and Reece (Swingle and Webber 1943). In the UPGMA phylogenetic tree, C. hystrix formed one cluster separated from other Citrus species that suggests some Papeda species may be the most primitive Citrus (Nicolosi et al. 2000). C. ichangnesis was first clustered with pummelo and mandarin, and then clustered with citron supporting the hypothesis that C. ichangnesis could be an ancestor of mandarin, and is supported by the results of Xie et al. (2008) and Pang et al. (2007). According to our results, the citron is the most distantly related species among them, which is confirmed by the sequencing of their chloroplast genomes (Carbonell-Caballero et al. 2015).

The SSAP markers showed that the 24 sweet orange clones had patterns that were closer to ‘Valencia’ than ‘Pineapple’, and this was supported by phenotypic traits (data not shown). The SSAP markers also showed that the 24 clones had abundant genetic variations which would be helpful for future variety registration and protection of intellectual property. Many widely grown citrus cultivars, such as sweet orange, grapefruit, lemon, and various clonal selections of Satsuma and Clementine mandarins, originated as either bud sport or apomictic seedling mutations (Rao et al. 2009). The use of traditional markers to distinguish such mutant clones is difficult because of lower genetic variability associated with these maker systems. The LTR-RT based SSAP markers were especially useful for distinguishing accessions with similar genetic backgrounds, like bud sport mutations in citrus (Venturi et al. 2006; Zhao et al. 2010). In our previous study, 18 SSR primer sets were used to distinguish these 27 accessions. However, all accessions showed similar PCR amplification patterns and were indistinguishable (data now shown). In C. x clementina, SSAP was successfully used to distinguish 24 accessions generated from bud mutations; other markers (ISSR, RAPD, AFLP, and SSR) tested could not (Bretó et al. 2001). Compared to other routinely used molecular markers such as SSR, LTR-RT based SSAP markers are better suited for genetic relationship analysis and phylogenetic analysis (Biswas et al. 2011; Schulman et al. 2012). An advantage of SSAP is that polymorphisms at multiple loci are detected in a single assay, while SSR usually detects polymorphisms at one locus (Powell et al. 1996). Retrotransposons are important sources of variation in Citrus, especially in the species mentioned above (Bretó et al. 2001; Wang et al. 2017). Several Sicilian blood oranges arose by insertion of a Copia-like retrotransposon (Tcs-1) adjacent to a MYB transcriptional activator (named Ruby) for anthocyanin production. The LTR of Tcs-1 provides a novel promoter that activates Ruby expression in the flesh of the fruit, resulting in distinctive red coloration in response to cold conditions. A Chinese blood orange (‘Jingxian’) was also found to contain an independent insertion of a similar retrotransposon that confers tissue-specific red coloration also in response to cold conditions (Butelli et al. 2012).

Conclusion

Full-length LTR-RTs were mined from the Clementine genome and classified based on structural details. Randomly selected IRAP and SSAP markers were tested and showed that they could differentiate citrus accessions and mutant clones. Our findings indicate that LTR-RTs are an excellent molecular marker resource because they are easy to develop, polymorphic, widely distributed, and transferable within the Rutaceae family.