An Easy-to-Follow Pipeline for Long Noncoding RNA Identification: A Case Study in Diploid Strawberry Fragaria vesca
Long noncoding RNAs (lncRNAs), defined as transcripts longer than 200 nucleotides without coding potential, are a new class of regulatory molecules with roles in diverse biological processes. New lncRNAs can readily be identified by mining RNA-seq data from a wide range of plant species. However, challenges remain as to how one can distinguish functional lncRNAs from mRNAs coding for small peptides or products of pseudogenes without any function. In this chapter, stepwise instruction is provided using RNA-seq datasets of developing wild strawberry fruit to illustrate each step. The workflow can be divided into three parts. Part I concerns standard RNA-seq data processing and analysis; part II describes lncRNA identification; part III describes several approaches aimed at shedding lights on lncRNA function. The description is intended for beginners with easy-to-follow steps. Text boxes provide codes and explanations. While it is relatively easy to identify lncRNAs, it is difficult to infer their function in the absence of coding information. Multiple RNA-seq libraries across tissues and stages are useful resources for deducing possible function of lncRNAs based on their expression and co-regulation.
Key wordslncRNA RNA-seq Strawberry Identification Correlation analysis
This work was supported by the National Natural Science Foundation of China (31572098 and 31772274) to C.K., US National Science Foundation Grant (IOS1444987) to Z.L., and the Scientific and Technological Self-innovation Foundation of Huazhong Agricultural University (2014RC005 to Z.L. and 2014RC017 to C.K.).
- 4.Li L, Eichten SR, Shimizu R, Petsch K, Yeh CT, Wu W, Chettoor AM, Givan SA, Cole RA, Fowler JE, Evans MM, Scanlon MJ, Yu J, Schnable PS, Timmermans MC, Springer NM, Muehlbauer GJ (2014) Genome-wide discovery and characterization of maize long non-coding RNAs. Genome Biol 15(2):R40. https://doi.org/10.1186/gb-2014-15-2-r40CrossRefPubMedPubMedCentralGoogle Scholar
- 18.Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578. https://doi.org/10.1038/nprot.2012.016CrossRefPubMedPubMedCentralGoogle Scholar
- 20.Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323(5910):133–138. https://doi.org/10.1126/science.1162986CrossRefGoogle Scholar
- 23.Laing WA, Martinez-Sanchez M, Wright MA, Bulley SM, Brewster D, Dare AP, Rassam M, Wang D, Storey R, Macknight RC, Hellens RP (2015) An upstream open reading frame is essential for feedback regulation of ascorbate biosynthesis in Arabidopsis. Plant Cell 27(3):772–786. https://doi.org/10.1105/tpc.114.133777CrossRefPubMedPubMedCentralGoogle Scholar