TrueSight: Self-training Algorithm for Splice Junction Detection Using RNA-seq
RNA-seq has proven to be a powerful technique for transcriptome profiling based on next-generation sequencing (NGS) technologies. However, due to the limited read length of NGS data, it is extremely challenging to accurately map RNA-seq reads to splice junctions, which is critically important for the analysis of alternative splicing and isoform construction. Several tools have been developed to find splice junctions by RNA-seq de novo, without the aid of gene annotations [1-3]. However, the sensitivity and specificity of these tools need to be improved. In this paper, we describe a novel method, called TrueSight, that combines information from (i) RNA-seq read mapping quality and (ii) coding potential from the reference genome sequences into a unified model that utilizes semi-supervised learning to precisely identify splice junctions.