Advertisement

Comp-D: a program for comprehensive computation of D-statistics and population summaries of reticulated evolution

  • Steven M. MussmannEmail author
  • Marlis R. Douglas
  • Max R. Bangs
  • Michael E. Douglas
Methods and Resources Article

Abstract

Computation of Patterson’s D-statistic and its five-taxon derivatives are important phylogenetic methods for the quantification of reticulated evolution, yet are limited in application by the lack of a single, comprehensive program to efficiently perform all necessary calculations from common phylogenetic and population genetic program file formats. To increase accessibility for a broad range of researchers, we present a user-friendly program (COMP-D) that provides flexibility for incorporating heterozygous sites, implements multiple statistical methods, and aggregates results from multiple tests. Program augmentations also facilitate the detection of population-level introgression. COMP-D provides a threefold increase in speed relative to comparable software. It is implemented in C++ and released under the GNU General Public License v3.0. Source code is available for Linux/Mac OS X from: https://github.com/stevemussmann/Comp-D_MPI.

Keywords

RADseq Introgression SNP analysis Next-generation sequencing 

Notes

Acknowledgements

The Arkansas High Performance Computing Center (AHPCC) provided technical assistance and computational resources. Tyler K. Chafin and Bradley T. Martin promoted software development by testing an early version of the program. This research was conducted in partial fulfillment of the Ph.D. degree in Biological Sciences at University of Arkansas (SMM). It was supported by generous University of Arkansas endowments: The Bruker Professorship in Life Sciences (MRD), the twenty-first Century Chair in Global Change Biology (MED), and a Doctoral Academy Fellowship (SMM). Three anonymous reviewers provided comments that greatly improved the manuscript.

Compliance with ethical standards

Conflict of interest

The authors have nothing to disclose.

Supplementary material

12686_2019_1087_MOESM1_ESM.xlsx (12 kb)
Supplementary Table 1. Results of four-taxon D-statistic tests comparing methods in COMP-D for handling heterozygous loci versus those from pyRAD. Each column shows the number of statistically significant tests (α=0.001) in each treatment. COMP-D offers two methods of assessing statistical significance (Z-scores and Chi-square tests) whereas pyRAD offers only Z-scores. Two treatments (HetRand and HetFreq) considered all heterozygous loci in D-statistic calculations, but differed by either randomly picking an allele to represent an individual (HetRand) or using SNP frequency calculations (HetFreq). The HetIgnore method removed all heterozygous loci from calculations. All tests employed SNP-data for catostomid fishes of western North America. Taxonomic abbreviations (P1, P2, P3, and O columns) are as follows: BBS = Bonneville Bluehead Sucker, BLS = Bridgelip Sucker, FMS = Flannelmouth Sucker, LNS = Longnose Sucker, MTS = Mountain Sucker, RBS = Razorback Sucker, SOS = Sonora Sucker, THS = Tahoe Sucker, WTS = White Sucker. Abbreviations in parentheses next to species abbreviations represent different populations. BB = Bonneville Basin, CB = Columbia River Basin, GC = Grand Canyon of the Colorado River, LB = Lahontan Basin, LC = Little Colorado River, UC = Upper Colorado River Basin, VR = Virgin River, wen = Wenima Wildlife Area of the Little Colorado River. (XLSX 11 KB)
12686_2019_1087_MOESM2_ESM.xlsx (12 kb)
Supplementary Table 2. The number of biallelic loci recovered using heterozygous loci (Het. Included) versus only fixed loci (Het. Excluded). Mean number of loci (Avg. Loci) and standard deviation (StDev) are presented for each. The % decrease indicates those loci lost by considering only fixed differences among taxa. All tests employed data for catostomid fishes of western North America. Taxonomic abbreviations (P1, P2, P3, and O columns) are: BBS = Bonneville Bluehead Sucker, BLS = Bridgelip Sucker, FMS = Flannelmouth Sucker, LNS = Longnose Sucker, MTS = Mountain Sucker, RBS = Razorback Sucker, SOS = Sonora Sucker, THS = Tahoe Sucker, WTS = White Sucker. Abbreviations in parentheses next to species abbreviations represent different populations. BB = Bonneville Basin, CB = Columbia River Basin, GC = Grand Canyon of the Colorado River, LB = Lahontan Basin, LC = Little Colorado River, UC = Upper Colorado River Basin, VR = Virgin River, wen = Wenima Wildlife Area of the Little Colorado River. (XLSX 11 KB)

References

  1. Allendorf FW et al (2001) The problems with hybrids: setting conservation guidelines. Trends Ecol Evol 16(11):613–622CrossRefGoogle Scholar
  2. Árnason Ú (2018) Whole-genome sequencing of the blue whale and other rorquals finds signatures for introgressive gene flow. Sci Adv 4:eaap9873CrossRefGoogle Scholar
  3. Bangs MR et al (2018) Unraveling historical introgression and resolving phylogenetic discord within Catostomus (Osteichthyes: Catostomidae). BMC Evol Biol 18:86CrossRefGoogle Scholar
  4. Blackmon H, Adams RA (2015) EvobiR: Tools for comparative analyses and teaching evolutionary biology.  https://doi.org/10.5281/zenodo.30938
  5. Bohling JH (2016) Strategies to address the conservation threats posed by hybridization and genetic introgression. Biol Conserv 203:321–327CrossRefGoogle Scholar
  6. DaCosta JM, Sorensen MD (2014) Amplification biases and consistent recovery of loci in a double-digest RAD-seq protocol. PLoS ONE 9(9):e106713CrossRefGoogle Scholar
  7. Durand EY et al (2011) Testing for ancient admixture between closely related populations. Mol Biol Evol 28:2239–2252CrossRefGoogle Scholar
  8. Eaton DA (2014) PyRad: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics 30:1844–1849CrossRefGoogle Scholar
  9. Eaton DA, Ree RH (2013) Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst Biol 62(5):689–706CrossRefGoogle Scholar
  10. Eaton DA et al (2015) Historical introgression among the American live oaks and the comparative nature of tests for introgression. Evolution 69:2587–2601CrossRefGoogle Scholar
  11. Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599CrossRefGoogle Scholar
  12. Gompert Z, Buerkle CA (2010) Introgress: a software package for mapping components of isolation in hybrids. Mol Ecol Res 10:378–384CrossRefGoogle Scholar
  13. Green RE et al (2010) A draft sequence of the Neanderthal genome. Science 328(5979):710–722CrossRefGoogle Scholar
  14. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70Google Scholar
  15. Hou Y et al (2015) Thousands of RAD-seq loci fully resolve the phylogeny of the highly disjunct arctic-alpine Diapensia (Diapensiaceae). PLoS ONE 10(10):e0140175CrossRefGoogle Scholar
  16. Korneliussen TS et al (2014) ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15:356CrossRefGoogle Scholar
  17. Malukiewicz J et al (2015) Natural and anthropogenic hybridization in two species of eastern Brazilian marmosets (Callithrix jacchus and C. penicillate). PLoS One 10(6):e0127268CrossRefGoogle Scholar
  18. Martin SH et al (2015) Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Mol Biol Evol 32:244–257CrossRefGoogle Scholar
  19. Ottenburghs J et al (2017) A history of hybrids? Genomic patterns of introgression in the true geese. BMC Evol Biol 17:201CrossRefGoogle Scholar
  20. Patterson N et al (2012) Ancient admixture in human history. Genetics 192:1065–1093CrossRefGoogle Scholar
  21. Pease JB, Hahn MW (2015) Detection and polarization of introgression in a five-taxon phylogeny. Syst Biol 64:651–662CrossRefGoogle Scholar
  22. Perneger TV (1998) What’s wrong with Bonferroni adjustments. Brit Med J 316:1236–1238CrossRefGoogle Scholar
  23. Rice WR (1989) Analyzing tables of statistical tests. Evolution 43:223–225CrossRefGoogle Scholar
  24. Zhang W et al (2016) Genome-wide introgression among distantly related Heliconius butterfly species. Genome Biol 17:25CrossRefGoogle Scholar
  25. Zheng Y, Janke A (2018) Gene flow analysis method, the D-statistic, is robust in a wide parameter space. BMC Bioinform 19:10CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  • Steven M. Mussmann
    • 1
    Email author
  • Marlis R. Douglas
    • 1
  • Max R. Bangs
    • 2
  • Michael E. Douglas
    • 1
  1. 1.Department of Biological SciencesUniversity of ArkansasFayettevilleUSA
  2. 2.Department of Biological ScienceFlorida State UniversityTallahasseeUSA

Personalised recommendations