Statistical Significance for NGS Reads Similarities
In this work we present a significance curve to segregate random alignments from true matches in by identity sequence comparison, especially suitable for sequencing data produced by NGS-technologies. The experimental approach reproduces the random local ungapped similarities distribution by score and length from which it is possible to asses the statistical significance of any particular ungapped similarity. This work includes the study of the distribution behaviour as a function of the experimental technology used to produce the raw sequences, as well as the scoring system used in the comparison. Our approach reproduces the expected behaviour and completes the proposal of Rost and Sander for homology based sequence comparisons. Results can be exploited by computational applications to reduce the computational cost and memory usage.
Keywordsassembly reads similarity NGS
Unable to display preview. Download preview PDF.
- 1.Swindell, S.R., Plasterer, T.N.: SEQMAN. Contig assembly. Methods Mol. Biol. 70, 75–89 (1997)Google Scholar
- 4.Chevreux, B., Wetter, T., Suhai, S.: Genome sequence assembly using trace signals and additional sequence information. In: Comput. Sci. Biol.: Proc. German Conference on Bioinformatics GCB 1999 GCB, pp. 45–56 (1999)Google Scholar