Skip to main content

Multiple Sequence Alignment System for Pyrosequencing Reads

  • Conference paper
Book cover Bioinformatics and Computational Biology (BICoB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Included in the following conference series:

Abstract

Pyrosequencing is among the emerging sequencing techniques, capable of generating upto 100,000 overlapping reads in a single run. This technique is much faster and cheaper than the existing state of the art sequencing technique such as Sanger. However, the reads generated by pyrosequencing are short in size and contain numerous errors. In order to use these reads for any subsequent analysis, the reads must be aligned . Existing multiple sequence alignment methods cannot be used as they do not take into account the specific positions of the sequences with respect to the genome, and are highly inefficient for large number of sequences. Therefore, the common practice has been to use either simple pairwise alignment despite its poor accuracy for error prone pyroreads, or use computationally expensive techniques based on sequential gap propagation. In this paper, we develop a computationally efficient method based on domain decomposition, referred to as pyro-align, to align such large number of reads. The proposed alignment algorithm accurately aligns the erroneous reads in a short period of time, which is orders of magnitude faster than any existing method. The accuracy of the alignment is confirmed from the consensus obtained from the multiple alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Saeed, F., Khokhar, A.: Sample-Align-D: A High Performance Multiple Sequence Alignment System using Phylogenetic Sampling and Domain Decomposition. In: Proc. 23rd IEEE International Parallel and Distributed Processing Symposium (April 2007)

    Google Scholar 

  2. Hou1, X.-L., Cao, Q.-Y., Jia, H.-Y., Chen, Z.: Pyrosequencing analysis of the gyrB gene to differentiate bacteria responsible for diarrheal diseases. European Journal of Clinical Microbiology & Infectious Diseases 27(7), 587–596 (2007)

    Article  Google Scholar 

  3. Liu, Z., Lozupone, C., Hamady, M., Bushman, F.D., Knight, R.: Short pyrosequencing reads suffice for accurate microbial community analysis. Nucl. Acids Res. 541 (2007)

    Google Scholar 

  4. Edgar, R.C.: Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucl. Acids Res. 32(1), 380–385 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)

    Article  CAS  PubMed  Google Scholar 

  6. Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15(1), 87–88 (1999)

    Article  CAS  PubMed  Google Scholar 

  7. Pocock, M., Down, T., Hubbard, T.: BioJava: open source components for bioinformatics. SIGBIO Newsl 20(2), 10–12 (2000)

    Article  Google Scholar 

  8. Setubal, C., Meidanis, J.: Introduction to Computational Molecular Biology (January 1997)

    Google Scholar 

  9. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology (January 1997)

    Google Scholar 

  10. Gusfield, D.: Efficient methods for multiple sequence alignment with guaranteed error bounds. Computer Science Division, UC Davis, Technical Report CSE 91-4 (1991)

    Google Scholar 

  11. Schmid, R., Schuster, S.C., Steel, M.A., Huson, D.H.: ReadSim-A simulator for Sanger and 454 sequencing (2006)

    Google Scholar 

  12. Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.-Y., Wang, C., Gharizadeh, B., Ronaghi, M., Shafer, R.W., Beerenwinkel, N.: Viral Population Estimation Using Pyrosequencing: PLoS Comput Biol. Public Library of Science 4 (May 2008)

    Google Scholar 

  13. Wang, C., Mitsuya, Y., Gharizadeh, B., Ronaghi, M.: Characterization of mutation spectra with ultra-deep pyrosequencing, application to HIV-1 drug resistance. Genome Res. 17(8), 1195–1201 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N.: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. In: RECOMB 2009 (accepted paper) (2009)

    Google Scholar 

  15. Hutchison III, C.A.: DNA sequencing, bench to bedside and beyond. Nucleic Acids Research 35, 6227–6237 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wang, L., Jiang, T.: On the Complexity of Multiple Sequence Alignment. Journal of Computational Biology 1(4), 337–348 (1994)

    Article  CAS  PubMed  Google Scholar 

  17. Notredame, C., Higgins, D., Heringa, J.: T-coffee: A novel method for multiple sequence alignments. Journal of Molecular Biology 302, 205–217 (2000)

    Article  CAS  PubMed  Google Scholar 

  18. Thompson, J., Higgins, D., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 222, 4673–4690 (1994)

    Article  Google Scholar 

  19. Edgar, R.C.: MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Research 32(5) (2004)

    Google Scholar 

  20. Edgar, R.C.: MUSCLE: A Multiple Sequence Alignment Method with Reduced Time and Space Complexity. BMC Bioinformatics, 1471–2105 (2004)

    Google Scholar 

  21. Morgenstern, B.: DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Research 32, 33–36 (2004)

    Article  Google Scholar 

  22. Saeed, F., Khokhar, A.: A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms. Journal of Parallel and Distributed Computing (to appear)

    Google Scholar 

  23. Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment. Genome Research 15, 330–340 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT A Novel Method for Rapid Multiple Sequence Alignment based on Fast Fourier Transform. Nucleic Acids Res. 30(14), 3059–3066 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Altschul, S.F.: Amino acid substitution matrices from an information theoretic prospective. J. Mol. Biol. 219(3), 555–565 (1991)

    Article  CAS  PubMed  Google Scholar 

  26. Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. BMC Bioinformatics 8(3), 275–282 (1991)

    Article  Google Scholar 

  27. Müller, T., Spang, R., Vingron, M.: Estimating Amino Acid Substitution Models: A Comparison of Dayhoff’s Estimator, the Resolvent Approach and a Maximum Likelihood Method. Mol. Bio. Evol. 19(1), 8–13 (2002)

    Article  Google Scholar 

  28. Edgar, R.C., Sjolander, K.: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 20(8), 1301–1308 (2004)

    Article  CAS  PubMed  Google Scholar 

  29. Huse, S., Huber, J., Morrison, H., Sogin, M., Welch, D.: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology 8(7), R143 (2007)

    Article  Google Scholar 

  30. Roche Applied Sciences:GS20 Data Processing Software Manual:Penzberg: Roche Diagnostics GmbH (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saeed, F., Khokhar, A., Zagordi, O., Beerenwinkel, N. (2009). Multiple Sequence Alignment System for Pyrosequencing Reads. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00727-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00726-2

  • Online ISBN: 978-3-642-00727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics