Abstract
Expressed sequence tags (ESTs) present a special set of problems for bioinformatic analysis. They are partial and error-prone, and large datasets can have significant internal redundancy. To facilitate analysis of small EST datasets from in-house projects, we present an integrated “pipeline” of tools that take EST data from sequence trace to database submission. These tools also can be used to provide clustering of ESTs into putative genes and to annotate these genes with preliminary sequence similarity searches. The systems are written to use the public-domain LINUX environment and other openly available analytical tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, M. D., Kelley, J. M., Gocayne, J. D., et al. (1991) Complementary DNA sequencing: expressed sequence tags and the human genome project. Science 252, 1651–1656.
McCombie, W. R., Adams, M. D., Kelley, J. M., et al. (1992) Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene homologues. Nat. Genet. 1, 124–131.
El-Sayed, N. M., Alarcon, C. M., Beck, J. C., et al. (1995) cDNA expressed sequence tags of Trypanosoma brucei rhodesiense provide new insights into the biology of the parasite. Mol. Biochem. Parasitol. 73, 75–90.
Wan, K.-L., Blackwell, J. M., and Ajioka, J. W. (1995) Toxoplasma gondii expressed sequence tags: insight into tachyzoite gene expression. Mol. Biochem. Parasitol. 75, 179–186.
Blaxter, M. L., Raghavan, N., Ghosh, I., et al. (1996) Genes expressed in Brugia malayi infective third stage larvae. Mol. Biochem. Parasitol. 77, 77–96.
Ivens, A. C. and Blackwell, J. M. (1996) Unravelling the Leishmania genome. Curr. Opin. Genet. Dev. 6, 704–710.
Levick, M. P., Blackwell, J. M., Connor, V., et al. (1996) An expressed sequence tag analysis of a full length, spliced-leader cDNA library from Leishmania major promastigotes. Mol. Biochem. Parasitol. 76, 345–348.
Ajioka, J. W., Boothroyd, J. C., Brunk, B. P., et al. (1998) Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 8, 18–28.
Djikeng, A., Agufa, C., Donelson, J. E., et al. (1998) Generation of expressed sequence tags as physical landmarks in the genome of Trypanosoma brucei. Gene 221, 93–106.
Manger, I. D., Hehl, A., Parmley, S., et al. (1998) Expressed sequence tag analysis of the bradyzoite stage of Toxoplasma gondii: identification of developmentally regulated genes. Infect. Immun. 66, 1632–1637.
Verdun, R. E., Di Paolo, N., Urmenyi, T. P., et al. (1998) Gene discovery through expressed sequence tag sequencing in Trypanosoma cruzi. Infect. Immun. 66, 5393–5398.
Ivens, A. C. and Blackwell, J. M. (1999) The Leishmania genome comes of age. Parasitol. Today 15, 225–231.
Johnston, D. A., Blaxter, M. L., Degrave, W. M., et al. (1999) Genomics and the biology of parasites. BioEssays 21, 131–147.
Santos, T. M., Johnston, D. A., Azevedo, V., et al. (1999) Analysis of the gene expression profile of Schistosoma mansoni cercariae using the expressed sequence tag approach. Mol. Biochem. Parasitol. 103, 79–97.
Urmenyi, T. P., Bonaldo, M. F., Soares, M. B., et al. (1999) Construction of a normalized cDNA library for the Trypanosoma cruzi genome project. J. Eukaryot. Microbiol. 46, 542–544.
Williams, S. A. and Johnston, D. A. (1999) Helminth genome analysis: the current status of the filarial and schistosome genome projects. Filarial Genome Project. Schistosome Genome Project. Parasitology 118, S19–S38.
Daub, J., Loukas, A., Pritchard, D. I., et al. (2000) A survey of genes expressed in adults of the human hookworm, Necator americanus. Parasitology 120, 171–184.
McCarter, J. P., Abad, J., Jones, J. T., et al. (2000) Rapid gene discovery in plant parasitic nematodes via expressed sequence tags. Nematology 2, 719–731.
Williams, S. A., Lizotte-Waniewski, M. R., Foster, J., et al. (2000) The filarial genome project: analysis of the nuclear, mitochondrial and endosymbiont genomes of Brugia malayi. Int. J. Parasitol. 30, 411–419.
Degrave, W. M., Melville, S., Ivens, A., et al. (2001) Parasite genome initiatives. Int. J. Parasitol. 31, 532–536.
Parkinson, J., Whitton, C., Guiliano, D., et al. (2001) 200,000 nematode ESTs on the net. Trends Parasitol. 17, 394–396.
McCarter, J. P., Clifton, S. W., Bird, D. M., et al. (2002) Nematode gene sequences, Update for June 2002. J. Nematol. 34, 71–74.
Parkinson, J., Guiliano, D., and Blaxter, M. (2002) Making sense of EST sequences by CLOBBing them. BMC Bioinf. 3, 31.
Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194.
Ewing, B., Hillier, L., Wendl, M. C., et al. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185.
Altschul, S. F., Gish, W., Miller, W., et al. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993) dbEST—database for “expressed sequence tags.” Nat. Genet. 4, 332–333.
Christoffels, A., van Gelder, A., Greyling, G., et al. (2001) STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res. 29, 234–238.
Parsons, J. D., Brenner, S., and Bishop, M. J. (1992) Clustering cDNA sequences. Comput. Appl. Biosci. 8, 461–466.
Parsons, J. D. (1995) Improved tools for DNA comparison and clustering. Comput. Appl. Biosci. 11, 603–613.
Gordon, D., Abajian, C., and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202.
Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877.
Parkinson, J. and Blaxter, M. L. (2002) SimiTri—visualising similarity relationships for large groups of sequences. Bioinformatics 19, 390–395.
Iseli, C., Jongeneel, C. V., and Bucher, P. (1999) in Proc. Int. Conf. Intell. Syst. Mol. Biol., 138–148.
Fukunishi, Y. and Hayashizaki, Y. (2001) Amino acid translation program for full-length cDNA sequences with frameshift errors. Physiol. Genomics 5, 81–87.
Hatzigeorgiou, A. G., Fiziev, P., and Reczko, M. (2001) DIANA-EST: a statistical analysis. Bioinformatics 17, 913–919.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Humana Press Inc., Totowa,NJ
About this protocol
Cite this protocol
Parkinson, J., Blaxter, M. (2004). Expressed Sequence Tags. In: Melville, S.E. (eds) Parasite Genomics Protocols. Methods in Molecular Biology™, vol 270. Humana Press. https://doi.org/10.1385/1-59259-793-9:093
Download citation
DOI: https://doi.org/10.1385/1-59259-793-9:093
Publisher Name: Humana Press
Print ISBN: 978-1-58829-062-5
Online ISBN: 978-1-59259-793-2
eBook Packages: Springer Protocols