Analysis of transcripts and splice isoforms in Medicago sativa L. by single-molecule long-read sequencing
- 110 Downloads
The full-length transcriptome of alfalfa was analyzed with PacBio single-molecule long-read sequencing technology. The transcriptome data provided full-length sequences and gene isoforms of transcripts in alfalfa, which will improve genome annotation and enhance our understanding of the gene structure of alfalfa.
As an important forage, alfalfa (Medicago sativa L.) is world-wide planted. For its complexity of genome and unfinished whole genome sequencing, the sequences and complete structure of mRNA transcripts remain unclear in alfalfa. In this study, single-molecule long-read sequencing was applied to investigate the alfalfa transcriptome using the Pacific Biosciences platform, and a total of 113,321 transcripts were obtained from young, mature and senescent leaves. We identified 72,606 open reading frames including 46,616 full-length ORFs, 1670 transcription factors from 54 TF families and 44,040 simple sequence repeats from 30,797 sequences. A total of 7568 alternative splicing events was identified and the majority of alternative splicing events in alfalfa was intron retention. In addition, we identified 17,740 long non-coding RNAs. Our results show the feasibility of deep sequencing full-length RNA from alfalfa transcriptome on a single-molecule level.
KeywordsMedicago sativa L. Transcripts and splice isoforms Single-molecule long-read sequencing lncRNA
Open reading frame
Long non-coding RNA
Simple sequence repeat
Next-generation high-throughput sequencing technology
Single molecule long reads sequencing technology
The program was supported by the National Natural Science Foundation of China (Grant Nos. 31601989 and 31672477). We acknowledge Jingjing Sui, Huaigen Xin and Dandan Chen from Biomarker Corporation (Beijing, China) for the facilities and expertise of the PacBio platform for libraries construction and sequencing.
YC and LH conceived and designed the research. YC, JY and TG conducted experiments. ZM and LX analyzed data. YC and LH wrote the manuscript. All authors read and approved the manuscript.
Compliance with ethical standards
Conflict of interest
The authors declare that there is no conflict of interest.
- Barnes D (1980) Alfalfa. Hybrid Crop Plants. https://doi.org/10.2135/1980.hybridizationofcrops.c9 CrossRefGoogle Scholar
- Dilernia DA, Chien JT, Monaco DC, Brown MP, Ende Z, Deymier MJ, Yue L, Paxinos EE, Allen S, Tirado-Ramos A, Hunter E (2015) Multiplexed highly-accurate DNA sequencing of closely-related HIV-1 variants using continuous long reads from single molecule, real-time sequencing. Nucleic Acids Res 43(20):e129CrossRefGoogle Scholar
- Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44(D1):D286–D293CrossRefGoogle Scholar
- Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5(2):R7–R7CrossRefGoogle Scholar
- Liang M, Raley C, Zheng X, Kutty G, Gogineni E, Sherman BT, Sun Q, Chen X, Skelly T, Jones K, Stephens R, Zhou B, Lau W, Johnson C, Imamichi T, Jiang M, Dewar R, Lempicki RA, Tran B, Kovacs JA, Huang DW (2016) Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads. BioData Min 9:13CrossRefGoogle Scholar
- Mayjonade B, Gouzy J, Donnadieu C, Pouilly N, Marande W, Callot C, Langlade N, Munos S (2017) Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. Biotechniques 62(1)Google Scholar
- Minoche AE, Dohm JC, Schneider J, Holtgrawe D, Viehover P, Montfort M, Sorensen TR, Weisshaar B, Himmelbauer H (2015) Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Genome Biol 16Google Scholar
- The Gene Ontology, Ashburner CM, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29CrossRefGoogle Scholar
- Tilgner H, Raha D, Habegger L, Mohiuddin M, Gerstein M, Snyder M (2013) Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G3-Genes Genomes Genet 3(3):387–397Google Scholar
- Xu ZC, Peters RJ, Weirather J, Luo HM, Liao BS, Zhang X, Zhu YJ, Ji AJ, Zhang B, Hu SN, Au KF, Song JY, Chen SL (2015) Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. Plant J 82(6):951–961CrossRefGoogle Scholar
- Zhu FY, Chen MX, Ye NH, Shi L, Ma KL, Yang JF, Cao YY, Zhang YJ, Yoshida T, Fernie AR, Fan GY, Wen B, Zhou R, Liu TY, Fan T, Gao B, Zhang D, Hao GF, Xiao S, Liu YG, Zhang JH (2017) Proteogenomic analysis reveals alternative splicing and translation as part of the abscisic acid response in Arabidopsis seedlings. Plant J 91(3):518–533CrossRefGoogle Scholar
- Zhu J, Wang X, Guo L, Xu Q, Zhao S, Li F, Yan X, Liu S, Wei C (2018) Characterization and alternative splicing profiles of lipoxygenase gene family in tea plant (Camellia sinensis). Plant Cell Physiol 59:1765–1781Google Scholar