# Statistical Inconsistency of Maximum Parsimony for *k*-Tuple-Site Data

## Abstract

One of the main aims of phylogenetics is to reconstruct the “Tree of Life.” In this respect, different methods and criteria are used to analyze DNA sequences of different species and to compare them in order to derive the evolutionary relationships of these species. Maximum parsimony is one such criterion for tree reconstruction, and it is the one which we will use in this paper. However, it is well known that tree reconstruction methods can lead to wrong relationship estimates. One typical problem of maximum parsimony is long branch attraction, which can lead to statistical inconsistency. In this work, we will consider a blockwise approach to alignment analysis, namely the so-called *k*-tuple analyses. For four taxa, it has already been shown that *k*-tuple-based analyses are statistically inconsistent if and only if the standard character-based (site-based) analyses are statistically inconsistent. So, in the four-taxon case, going from individual sites to *k*-tuples does not lead to any improvement. However, real biological analyses often consider more than only four taxa. Therefore, we analyze the case of five taxa for 2- and 3-tuple-site data and consider alphabets with two and four elements. We show that the equivalence of single-site data and *k*-tuple-site data then no longer holds. Even so, we can show that maximum parsimony is statistically inconsistent for *k*-tuple-site data and five taxa.

## Keywords

Maximum parsimony Statistical inconsistency Codons Long branch attraction Felsenstein zone## Notes

### Acknowledgements

The first and second authors thank the University of Greifswald for the Bogislaw studentship and the Landesgraduiertenförderung studentship, respectively, under which this work was conducted. Moreover, we wish to thank two anonymous reviewers for very helpful suggestions on an earlier version of this manuscript.

## References

- Anderson FE, Swofford DL (2004) Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA. Mol Phylogenet Evol 33(2):440–451CrossRefGoogle Scholar
- Bandelt HJ, Fischer M (2008) Perfectly misleading distances from ternary characters. Syst Biol 57(4):540–543. https://doi.org/10.1080/10635150802203880 CrossRefGoogle Scholar
- Crick FH, Barnett L, Brenner S, Watts-Tobin RJ (1961) General nature of the genetic code for proteins. Nature 192(4809):1227–1232CrossRefGoogle Scholar
- Delport W, Scheffler K, Seoighe C (2008) Models of coding sequence evolution. Brief Bioinform 10(1):97–109. https://doi.org/10.1093/bib/bbn049 CrossRefGoogle Scholar
- Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Biol 27(4):401. https://doi.org/10.1093/sysbio/27.4.401 CrossRefGoogle Scholar
- Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376. https://doi.org/10.1007/bf01734359 CrossRefGoogle Scholar
- Fischer M, Kelk S (2016) On the maximum parsimony distance between phylogenetic trees. Ann Comb 20(1):87–113. https://doi.org/10.1007/s00026-015-0298-1 MathSciNetCrossRefzbMATHGoogle Scholar
- Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Biol 20(4):406. https://doi.org/10.1093/sysbio/20.4.406 CrossRefGoogle Scholar
- Hartigan J (1973) Minimum mutation fits to a given tree. Biometrics 29(1):53–65. http://www.jstor.org/stable/2529676
- He XL, Wu B, Li Q, Peng WH, Huang ZQ, Gan BC (2016) Phylogenetic relationship of two popular edible Pleurotus in China, Bailinggu (
*P. eryngii*var. tuoliensis) and Xingbaogu (*P. eryngii*), determined by ITS, RPB2 and EF1\(\alpha \) sequences. Mol Biol Rep 43(6):573–582CrossRefGoogle Scholar - Jukes TH, Cantor CR (1969) Evolution of protein molecules, chapter 24. In: Munro HN (ed) Mammalian protein metabolism. Academic Press, New York, pp 21–132. https://doi.org/10.1016/B978-1-4832-3211-9.50009-7 CrossRefGoogle Scholar
- Knoop V, Müller K (2009) Gene und Stammbäume, 2nd edn. Springer Spektrum, HeidelbergCrossRefGoogle Scholar
- Neyman J (1971) Molecular studies of evolution: a source of novel statistical problems. In: Gupta SS, Yackel J (eds) Statistical decision theory and related topics. Academic Press, New York, pp 1–27. https://doi.org/10.1016/B978-0-12-307550-5.50005-8 Google Scholar
- Qu XJ, Jin JJ, Chaw SM, Li DZ, Yi TS (2017) Multiple measures could alleviate long-branch attraction in phylogenomic reconstruction of Cupressoideae (Cupressaceae). Sci Rep 7:41005CrossRefGoogle Scholar
- Raskoti BB, Jin WT, Xiang XG, Schuiteman A, Li DZ, Li JW, Huang WC, Jin XH, Huang LQ (2016) A phylogenetic analysis of molecular and morphological characters of Herminium (Orchidaceae, Orchideae): evolutionary relationships, taxonomy, and patterns of character evolution. Cladistics 32(2):198–210. https://doi.org/10.1111/cla.12125 CrossRefGoogle Scholar
- Sanderson M, Wojciechowski M, Hu JM, Khan TS, Brady S (2000) Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol Biol Evol 17(5):782–797CrossRefGoogle Scholar
- Sankoff D (1975) Minimal mutation trees of sequences. SIAM J Appl Math 28(1):35–42MathSciNetCrossRefzbMATHGoogle Scholar
- Semple C, Steel M (2003) Phylogenetics. Oxford lecture series in mathematics and its applications. Oxford University Press, Oxford. https://books.google.de/books?id=uR8i2qetjSAC
- Steel M, Penny D (2000) Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol 17(6):839. https://doi.org/10.1093/oxfordjournals.molbev.a026364 CrossRefGoogle Scholar
- Varga J, Frisvad JC, Samson R (2011) Two new aflatoxin producing species, and an overview of Aspergillus section Flavi. Stud Mycol 69:57–80CrossRefGoogle Scholar
- Wolfram Research, Inc (2017) Mathematica, version 10.3 (2017) Wolfram Research Inc, ChampaignGoogle Scholar