Abstract
The human genome contains more than 20000 protein-coding genes, but the complexity of the RNA population in any given human sample is at least one order of magnitude higher due to alternative splicing that generates different splicing isoforms. To this, one has to add an increasing number of non-coding RNAs and various forms of RNA editing. This high complexity poses important technical and computational questions such as,
-
how ‘deep’ should the planned sequencing be (i.e. how many clusters should be sequenced from the cDNA libraries) to obtain a good representation of the transcript diversity?
-
Is the processing of the dataset (i.e. the identification of the gene of origin for each sequence) feasible in terms of computation time?
-
Can the complexity be reduced?
In this chapter the problems of complexity and of mapping the RNA-seq reads to a the reference genome will be addressed from a probabilistic and informational point of view. The issue of reducing the complexity will be dealt with in Chapters 5 and 6.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Scuola Normale Superiore Pisa
About this chapter
Cite this chapter
Cellerino, A., Sanguanini, M. (2018). RNA-seq raw data processing. In: Transcriptome Analysis. CRM Series(), vol 17. Edizioni della Normale, Pisa. https://doi.org/10.1007/978-88-7642-642-1_3
Download citation
DOI: https://doi.org/10.1007/978-88-7642-642-1_3
Publisher Name: Edizioni della Normale, Pisa
Print ISBN: 978-88-7642-641-4
Online ISBN: 978-88-7642-642-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)