Abstract
SATé is a method for estimating multiple sequence alignments and trees that has been shown to produce highly accurate results for datasets with large numbers of sequences. Running SATé using its default settings is very simple, but improved accuracy can be obtained by modifying its algorithmic parameters. We provide a detailed introduction to the algorithmic approach used by SATé, and instructions for running a SATé analysis using the GUI under default settings. We also provide a discussion of how to modify these settings to obtain improved results, and how to use SATé in a phylogenetic analysis pipeline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kemena C, Notredame C (2009) Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25:2455–2465
Nelesen S, Liu K, Zhao D et al (2008) The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses. Pac Symp Biocomput 2008:25–36
Liu K, Linder CR, Warnow T (2010) Multiple sequence alignment: a major challenge to large-scale phylogenetics. PLoS Curr 2, RRN1198
Wang L-S, Leebens-Mack J, Wall PK, Beckman K, de Pamphilis CW, Warnow T (2011) The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE Trans Comput Biol Bioinform 8:1108–1119
Cantarel BL, Morrison HG, Pearson W (2006) Exploring the relationship between sequence similarity and accurate phylogenetic trees. Mol Biol Evol 11:2090–100
Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883):1632–5
Hall BG (2005) Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol Biol Evol 22(3):792–802
Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Mol Biol Evol 14(4):428–41
Ogden TH, Rosenberg MS (2006) Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 55(2):314–28
Larkin MA, Blackshields G, Brown NP et al (2007) ClustalW and ClustalX version 2.0. Bioinformatics 23:2947–2948
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
Edgar RC (2004) MUSCLE: a multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinformatics 9:286–298
Nelesen S, Liu K, Wang L-S et al (2012) DACTAL: fast and accurate estimations of trees without computing full sequence alignments. Bioinformatics 28:i274–i282
Varón A, Vinh LS, Wheeler WC (2010) POY version 4: phylogenetic analysis using dynamic homologies. Cladistics 26:72–85
Liu K, Nelesen S, Raghavan S, Linder CR, Warnow T (2009) Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy. IEEE/ACM Trans Comput Biol Bioinform 6(1):7–21
Liu K, Warnow T (2012) Treelength optimization for phylogeny estimation. PLoS One 7(3):e33104
Suchard MA, Redelings BD (2006) BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048
Fleissner R, Metzler D, von Haeseler A (2005) Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst Biol 54:548–561
Novák A, Miklós I, Lyngsoe R et al (2008) StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees. Bioinformatics 24:2403–2404
Lunter G, Miklós I, Drummond A et al (2005) Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6:83
Liu K, Raghavan S, Nelesen S et al (2009) Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324:1561–1564
Liu K, Warnow T, Holder MT et al (2012) SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst Biol 61(1):90–106
Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690
Price M, Dehal P, Arkin A (2010) FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490
Löytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A 102:10557–10562
Wheeler T, Kececioglu J (2007) Multiple alignment by aligning alignments. Bioinformatics 23:i559–i568
Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland, MA
Dewey CN (2012) Whole-genome alignment. Methods Mol Biol 855:237–257
Mirarab S, Nguyen N-P, Warnow T (2012) SEPP: SATé-enabled phylogenetic placement. Pac Symp Biocomput 2012:247–58
Matsen F, Kodner R, Armbrust EV (2010) pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11:538
Berger SA, Krompass D, Stamatakis A (2011) Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol 60:291–302
Liu K, Randal Linder C, Warnow T (2011) RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One 6(11):e27731. doi:10.1371/journal.pone.0027731
Stamatakis A (2006) Phylogenetic models of rate heterogeneity: a high performance computing perspective. Proc IPDPS, Rhodes, Greece, 2006
Jukes TH, Cantor CR (1969) Evolution of protein molecules. Mammalian protein metabolism. Academic, New York, pp 21–132
Posada D, Buckley T (2004) Model selection and model averaging in phylogenetics: advantages of Akaike Information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol 53(5):793–808
Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9):2104–2105
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699
Dayhoff M, Schwartz R, Orcutt B (1978) A model of evolutionary change in proteins. Atlas Protein Sequence Struct 5:345–352
Kosiol C, Goldman N (2005) Different versions of the Dayhoff rate matrix. Mol Biol Evol 22:193–199
Adachi J (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 42:459–468
Dimmic M, Rest J, Mindell D, Goldstein R (2002) rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55:65–73
Adachi J, Waddell P, Martin W, Hasegawa M (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 50:348–358
Mueller T, Vingron M (2000) Modeling amino acid replacement. J Comput Biol 7:761–776
Henikoff S, Henikoff J (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919
Yang Z (1998) Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol 46:409–418
Le S, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320
Bodaker I, Suzuki MT, Oren A, Béjà O (2012) Dead Sea rhodopsins revisited. Environ Microbiol Rep 4(6):617–621
Andam C, Harlow T, Papke RT, Gogarten JP (2012) Ancient origin of the divergent forms of leucyl-tRNA synthetases in the Halobacteriales. BMC Evol Biol 12(1):85
Hagopian R, Davidson JR, Datta RS et al (2010) SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction. Nucleic Acids Res 38(suppl 2):W29–W34
Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23:372–374
Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539
Wang N, Braun EL, Kimball RT (2012) Testing hypotheses about the sister group of the Passeriformes using an independent 30-locus data set. Mol Biol Evol 29(2):737–750
Xiang C-L, Gitzendanner MA, Soltis DE et al (2012) Phylogenetic placement of the enigmatic and critically endangered genus Saniculiphyllum (Saxifragaceae) inferred from combined analysis of plastid and nuclear DNA sequences. Mol Phylogenet Evol 64:357–367
Andam C, Harlow T, Thane R et al (2012) Ancient origin of the divergent forms of leucyl-tRNA synthetases in the Halobacteriales. Evol Biol 12:85
Huelsenbeck JP, Ronquist R (2001) MrBayes: Bayesian inference of phylogeny. Bioinformatics 17:754–755
Stockham C, Wang L-S, Warnow T (2002) Postprocessing of phylogenetic analysis using clustering. Bioinformatics 18(Suppl 1):i285–i293
Amenta N, Klinger J (2002). Case study: visualizing sets of evolutionary trees. In: Proceedings IEEE symposium on information visualization, pp 71–74
Bryant D (2003) A classification of consensus methods for phylogenetics. DIMACS series in discrete mathematics and theoretical computer science 51:163–184
Kannan S, Warnow T, Yooseph S (1998) Computing the local consensus of trees. SIAM J Comput 27(6):1695–1724
Phillips C, Warnow T (1996) The asymmetric median tree – a new model for building consensus trees. Discrete Appl Math 71(1–3):311–335
Mirarab S, Warnow T (2011) FAST-SP: linear time calculation of alignment accuracy. Bioinformatics 27(23):3250–3258
Maddison W (1997) Gene trees in species trees. Syst Biol 46(3):523–536
Boussau B, Szöllősi G, Duret L et al (2013) Genome-scale coestimation of species and gene trees. Genome Res 23(2):323–30
Yu Y, Degnan JH, Nakhleh L (2012) The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 8(4):e1002660
Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24(6):332–340
Chaudhary R, Bansal MS, Wehe A et al (2010) iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinformatics 11:547
Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer, and loss. Bioinformatics 28(12):i283–i291
Yang J, Warnow T (2011) Fast and accurate methods for phylogenomic analyses. RECOMB comparative genomics, 2011. BMC Bioinformatics 12(Suppl 9):S4
Bayzid MS, Warnow T (2012) Finding optimal species trees from incomplete gene trees under incomplete lineage sorting. J Comput Biol 19(6):591–605
Rice P, Longden I, Bleasby A (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet 16:276–277
Swofford DL (2003) PAUP*: phylogenetic analysis using parsimony (*and other methods), Version 4
Warnow T (2012) Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLoS Curr 4:RRN1308. doi:10.1371/currents.RRN1308
Swenson MS, Suri R, Linder CR et al (2012) SuperFine: fast and accurate supertree estimation. Syst Biol 61(2):214–227
Neves DT, Warnow TJ, Sobral L et al (2012) Parallelizing SuperFine. 27th Symp Appl Comp
Nguyen N, Mirarab S, Warnow T (2012) MRL and SuperFine + MRL: new supertree methods. Algorithms Mol Biol 7:3
Daskalakis C, Roch S (2010) Alignment-free phylogenetic reconstruction. Proc Res Comp Molec Biol (RECOMB), Lecture Notes Computer Science 6044: 123–137
Chan CX, Ragan RA (2013) Next-generation phylogenomics. Biol Direct 8:30. doi:10.1186/1745-6150-8-3
Vinga S, Almeida J (2003) Alignment-free sequence comparison – a review. Bioinformatics 19(4):513–523
Holder M, Warnow T, Mirarab S et al (2012) Online tutorial for SATe. http://phylo.bio.ku.edu/software/sate/sate_tutorial.pdf
Linder CR, Suri R, Liu K et al (2010) Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference. PLoS Curr 2:RRN1195. doi:10.1371/currents.RRN1195
Linder CR, Warnow T (2005) Overview of phylogeny reconstruction. In: Aluru S (ed) Handbook of computational biology. CRC computer and information science series. Chapman & Hall, Boca Raton, FL
Acknowledgments
This work was supported a training fellowship to KL from the Keck Center of the Gulf Coast Consortia, on the NLM Training Program in Biomedical Informatics, National Library of Medicine (NLM) T15LM007093. This work was also partially supported by NSF grant DEB 0733029 to TW.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Liu, K., Warnow, T. (2014). Large-Scale Multiple Sequence Alignment and Tree Estimation Using SATé. In: Russell, D. (eds) Multiple Sequence Alignment Methods. Methods in Molecular Biology, vol 1079. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-646-7_15
Download citation
DOI: https://doi.org/10.1007/978-1-62703-646-7_15
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-645-0
Online ISBN: 978-1-62703-646-7
eBook Packages: Springer Protocols