SeqTrim — A Validation and Trimming Tool for All Purpose Sequence Reads

Falgueras, Juan; Lara, Antonio J.; Cantón, Francisco R.; Pérez-Trabado, Guillermo; Gonzalo Claros, M.

doi:10.1007/978-3-540-74972-1_46

Juan Falgueras⁵,
Antonio J. Lara⁶,
Francisco R. Cantón⁶,
Guillermo Pérez-Trabado⁷ &
…
M. Gonzalo Claros⁸

Part of the book series: Advances in Soft Computing ((AINSC,volume 44))

1352 Accesses
5 Citations

Abstract

Bioinformatics tools are required to produce reliable, high quality data devoid of unwanted sequences in the preprocessing stage of current sequencing and EST projects. In this paper we describe SeqTrim, an algorithm designed to extract the insert sequence from any sequence read devoid of any foreign, contaminant or unwanted sequence, whatever the experimental process was. SeqTrim is easy to install and able to identify the sequence insert by removing low quality sequences, cloning vector, poly A or T tails, adaptors, and sequences that can be considered contaminants. It is easy to use and can be used as stand-alone application or as web page. The default parameters of the algorithm are best suited for most cases but a configuration file can be provided along with input sequences. SeqTrim admits several input and output formats (with and without quality values), which enables its inclusion in already or newly defined sequence processing workflows. SeqTrim is under continuous refinement due to collaboration between biologists and computer scientists which has succeed in correct dealing with most sequence cases and opens the possibility to include new capabilities to manage new kinds of bad sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Coker JS, Davies E (2004) Identifying adaptor contamination when mining DNA sequence data. Biotechniques 37, 194, 196, 198
Google Scholar
Chou HH, Holmes MH (2001) DNA sequence quality trimming and vector removal. Bioinformatics 17:1093–1104
Article Google Scholar
Bonfield JK, Smith K, Staden R (1995) A new DNA sequence assembly program. Nucleic Acids Res 23:4992–4999
Article Google Scholar
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186–194
Google Scholar
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8:175–185
Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Google Scholar
Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome research 9:868–877
Article Google Scholar
Li S, Chou HH (2004) LUCY2: an interactive DNA sequence quality trimming and vector removal tool. Bioinformatics 20:2865–2866
Article Google Scholar
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202
Google Scholar
Cantón F, Le Provost G, García V, Barré A, Frigerio JM, Paiva J, Fevereiro P, Ávila C, Mouret JF, de Daruvar A, Cánovas F, Plomion C (2003) Transcriptome analysis of wood formation in maritime pine. In Sustainable Forestry, Wood products and Biotechnology, S Espinel, Y Barredo, E Ritter, eds (Vitoria-Gasteiz: DFA-AFA Press)
Google Scholar
Liang F, Holt I, Pertea G, Karamycheva S, Salzberg S, Quackenbush J (2000) An optimized protocol for analysis of EST sequences. Nucleic acids research 28:3657–3665
Article Google Scholar
Masoudi-Nejad A, Tonomura K, Kawashima S, Moriya Y, Suzuki M, Itoh M, Kanehisa M, Endo T, Goto S (2006) EGassembler: online bioinformatics service for largescale processing, clustering and assembling ESTs and genomic DNA fragments. Nucleic Acids Res 34:W459–462
Article Google Scholar
Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA (1999) A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. Gemone Res 9:1143–1155
Article Google Scholar
Scheetz TE, Trivedi N, Roberts CA, Kucaba T, Berger B, Robinson NL, Birkett CL, Gavin AJ, O’Leary B, Braun TA, Bonaldo MF, Robinson JP, Sheffield VC, Soares MB, Casavant TL (2003) ESTprep: preprocessing cDNA sequence reads. Bioinformatics 19:1318–1324
Article Google Scholar

Download references

Author information

Authors and Affiliations

Lenguajes y Ciencias de la Computación, ETSI Informática, Campus Universitario de Teatinos, s/n., E-29071, Málaga, Spain
Juan Falgueras
Biología Molecular y Bioquímica, Universidad de Málaga, Campus Universitario de Teatinos, s/n., E-29071, Málaga, Spain
Antonio J. Lara & Francisco R. Cantón
Arquitectura de Computadores, ETSI Informática, Campus de Teatinos, E-29071, Málaga, Spain
Guillermo Pérez-Trabado
Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias Universidad de Málaga, 29071, Málaga, Spain
M. Gonzalo Claros

Authors

Juan Falgueras
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Lara
View author publications
You can also search for this author in PubMed Google Scholar
Francisco R. Cantón
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Pérez-Trabado
View author publications
You can also search for this author in PubMed Google Scholar
M. Gonzalo Claros
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior Campus Vena, Edifico C, Universidad de Burgos, C/Francisco de Vitoria s/n, 09006, Burgos, Spain
Emilio Corchado
Departamento de Informática y Automática Facultad de Ciencias, Universidad de Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Juan M. Corchado
Centre for Quantifiable Quality of Service in Communication Systems (Q2S) Centre of Excellence, Norwegian University of Science and Technology, O.S. Bragstads plass 2E, 7491, Trondheim, Norway
Ajith Abraham

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Falgueras, J., Lara, A.J., Cantón, F.R., Pérez-Trabado, G., Gonzalo Claros, M. (2007). SeqTrim — A Validation and Trimming Tool for All Purpose Sequence Reads. In: Corchado, E., Corchado, J.M., Abraham, A. (eds) Innovations in Hybrid Intelligent Systems. Advances in Soft Computing, vol 44. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74972-1_46

Download citation

DOI: https://doi.org/10.1007/978-3-540-74972-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74971-4
Online ISBN: 978-3-540-74972-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics