Abstract
Bioinformatics tools are required to produce reliable, high quality data devoid of unwanted sequences in the preprocessing stage of current sequencing and EST projects. In this paper we describe SeqTrim, an algorithm designed to extract the insert sequence from any sequence read devoid of any foreign, contaminant or unwanted sequence, whatever the experimental process was. SeqTrim is easy to install and able to identify the sequence insert by removing low quality sequences, cloning vector, poly A or T tails, adaptors, and sequences that can be considered contaminants. It is easy to use and can be used as stand-alone application or as web page. The default parameters of the algorithm are best suited for most cases but a configuration file can be provided along with input sequences. SeqTrim admits several input and output formats (with and without quality values), which enables its inclusion in already or newly defined sequence processing workflows. SeqTrim is under continuous refinement due to collaboration between biologists and computer scientists which has succeed in correct dealing with most sequence cases and opens the possibility to include new capabilities to manage new kinds of bad sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Coker JS, Davies E (2004) Identifying adaptor contamination when mining DNA sequence data. Biotechniques 37, 194, 196, 198
Chou HH, Holmes MH (2001) DNA sequence quality trimming and vector removal. Bioinformatics 17:1093–1104
Bonfield JK, Smith K, Staden R (1995) A new DNA sequence assembly program. Nucleic Acids Res 23:4992–4999
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome research 9:868–877
Li S, Chou HH (2004) LUCY2: an interactive DNA sequence quality trimming and vector removal tool. Bioinformatics 20:2865–2866
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202
Cantón F, Le Provost G, García V, Barré A, Frigerio JM, Paiva J, Fevereiro P, Ávila C, Mouret JF, de Daruvar A, Cánovas F, Plomion C (2003) Transcriptome analysis of wood formation in maritime pine. In Sustainable Forestry, Wood products and Biotechnology, S Espinel, Y Barredo, E Ritter, eds (Vitoria-Gasteiz: DFA-AFA Press)
Liang F, Holt I, Pertea G, Karamycheva S, Salzberg S, Quackenbush J (2000) An optimized protocol for analysis of EST sequences. Nucleic acids research 28:3657–3665
Masoudi-Nejad A, Tonomura K, Kawashima S, Moriya Y, Suzuki M, Itoh M, Kanehisa M, Endo T, Goto S (2006) EGassembler: online bioinformatics service for largescale processing, clustering and assembling ESTs and genomic DNA fragments. Nucleic Acids Res 34:W459–462
Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA (1999) A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. Gemone Res 9:1143–1155
Scheetz TE, Trivedi N, Roberts CA, Kucaba T, Berger B, Robinson NL, Birkett CL, Gavin AJ, O’Leary B, Braun TA, Bonaldo MF, Robinson JP, Sheffield VC, Soares MB, Casavant TL (2003) ESTprep: preprocessing cDNA sequence reads. Bioinformatics 19:1318–1324
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Falgueras, J., Lara, A.J., Cantón, F.R., Pérez-Trabado, G., Gonzalo Claros, M. (2007). SeqTrim — A Validation and Trimming Tool for All Purpose Sequence Reads. In: Corchado, E., Corchado, J.M., Abraham, A. (eds) Innovations in Hybrid Intelligent Systems. Advances in Soft Computing, vol 44. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74972-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-540-74972-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74971-4
Online ISBN: 978-3-540-74972-1
eBook Packages: EngineeringEngineering (R0)