Wie funktionieren MSA-Programme?

Sperlea, Theodor

doi:10.1007/978-3-662-58811-6_2

Theodor Sperlea²

980 Accesses

Zusammenfassung

Wie gut ein MSA-Programm dafür geeignet ist, eine bestimmte Sammlung von Sequenzen zu alignieren, hängt stark damit zusammen, wie es funktioniert. Dieses Kapitel soll dazu dienen, einen groben Überblick über die Methoden des multiplen Sequenzalignments zu geben. Dazu werden wir uns im Abschn. 2.2 zunächst anschauen, wie Sequenzpaare aneinander aligniert werden. Im Abschn. 2.3 werden wir dann beleuchten, wie diese Methodik auf Datensätze mit mehr als zwei Sequenzen ausgeweitet werden kann, welche Probleme dabei auftreten und wie sie umgangen werden.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 14.99; Price excludes VAT (USA)

Softcover Book: USD 19.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838
CAS PubMed Google Scholar
Altschul S (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
CAS PubMed PubMed Central Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
CAS PubMed Google Scholar
Boratyn GM, Schäffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7(1):12
Article CAS PubMed PubMed Central Google Scholar
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinf 10(1):421
Article Google Scholar
Chang J-M, Tommaso PD, Notredame C (2014) TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol Biol Evol 31(6):1625–1637
Article CAS PubMed Google Scholar
Cline M, Hughey R, Karplus K (2002) Predicting reliable regions in protein sequence alignments. Bioinformatics 18(2):306–314
Article CAS PubMed Google Scholar
Delcher AL (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(11):2478–2483
Article PubMed PubMed Central Google Scholar
Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL (1999) Alignment of whole genomes. Nucleic Acids Res 27(11):2369–2376
Article CAS PubMed PubMed Central Google Scholar
Gardner PP (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 33(8):2433–2439
Article CAS PubMed PubMed Central Google Scholar
Gotoh O (1990) Consistency of optimal sequence alignments. Bull Math Biol 52(4):509–525
Article CAS PubMed Google Scholar
Haubold B (2013) Alignment-free phylogenetics and population genetics. Brief Bioinform 15(3):407–418
Article PubMed Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 89(22):10915–10919
Article CAS PubMed PubMed Central Google Scholar
Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20(2):175–186
Article CAS PubMed Google Scholar
Kelley DR, Snoek J, Rinn J (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. https://doi.org/10.1101/gr.200535.115
Google Scholar
Landan G, Graur D (2007) Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 24(6):1380–1383
Article CAS PubMed Google Scholar
Liu Y, Schmidt B, Maskell DL (2009) MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA. In: 2009 20th IEEE International Conference on Application-specific Systems Architectures and Processors, S 121–128
Google Scholar
McGinnis S, Madden TL (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32(Web Server):W20–W25
Article CAS PubMed PubMed Central Google Scholar
Morgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, Schäffer AA (2008) Database indexing for production MegaBLAST searches. Bioinformatics 24(16):1757–1764
Article CAS PubMed PubMed Central Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453
Article CAS PubMed Google Scholar
Nguyen NG, Tran VA, Ngo DL, Phan D, Lumbanraja FR, Faisal MR, Abapihi B, Kubo M, Satou K (2016) DNA sequence classification by convolutional neural network. J Biomed Sci Eng 9(5):280–286
Article CAS Google Scholar
Ortuño FM, Valenzuela O, Pomares H, Rojas F, Florido JP, Urquiza JM, Rojas I (2012) Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucleic Acids Res 41(1):e26–e26
Article PubMed PubMed Central Google Scholar
Pearson WR (2013) Selecting the right similarity-scoring matrix. Curr Protoc Bioinformatics 43:1–9
Google Scholar
Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23(7):802–808
Article CAS PubMed Google Scholar
Penn O, Privman E, Landan G, Graur D, Pupko T (2010) An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol 27(8):1759–1767
Article CAS PubMed PubMed Central Google Scholar
Quang D, Xie X (2016) Danq: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 44:e107
Article PubMed PubMed Central Google Scholar
Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng Des Sel 12(2): 85–94
Article CAS Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Article CAS PubMed Google Scholar
Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249(4):816–831
Article CAS PubMed Google Scholar
Wallace IM (2006) M-coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids Res 34(6):1692–1699
Article CAS PubMed PubMed Central Google Scholar
Yoon B-J (2009) Hidden markov models and their applications in biological sequence analysis. Curr Genomics 10(6):402–415
Article CAS PubMed PubMed Central Google Scholar
Zhang Z (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res 26(17):3986–3990
Article CAS PubMed PubMed Central Google Scholar
Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7(1–2):203–214
Article CAS PubMed Google Scholar
Zielezinski A, Vinga S, Almeida J, Karlowski WM (2017) Alignment-free sequence comparison: benefits applications and tools. Genome Biol 18(1):186
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Marburg, Deutschland
Theodor Sperlea

Authors

Theodor Sperlea
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sperlea, T. (2019). Wie funktionieren MSA-Programme?. In: Multiple Sequenzalignments. Springer Spektrum, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58811-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-58811-6_2
Published: 04 May 2019
Publisher Name: Springer Spektrum, Berlin, Heidelberg
Print ISBN: 978-3-662-58810-9
Online ISBN: 978-3-662-58811-6
eBook Packages: Life Science and Basic Disciplines (German Language)

Publish with us

Policies and ethics