Skip to main content

SNP Discovery from Single and Multiplex Genome Assemblies of Non-model Organisms

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1712))

Abstract

Population genetic studies of non-model organisms often rely on initial ascertainment of genetic markers from a single individual or a small pool of individuals. This initial screening has been a significant barrier to beginning population studies on non-model organisms (Aitken et al., Mol Ecol 13:1423–1431, 2004; Morin et al., Trends Ecol Evol 19:208–216, 2004). As genomic data become increasingly available for non-model species, SNP ascertainment from across the genome can be performed directly from published genome contigs and short-read archive data. Alternatively, low to medium genome coverage from shotgun NGS library sequencing of single or pooled samples, or from reduced-representation libraries (e.g., capture enrichment; see Ref. “Hancock-Hanser et al., Mol Ecol Resour 13:254–268, 2013”) can produce sufficient new data for SNP discovery with limited investment. We describe protocols for assembly of short read data to reference or related species genome contig sequences, followed by SNP discovery and filtering to obtain an optimal set of SNPs for population genotyping using a variety of downstream high-throughput genotyping methods.

This is a preview of subscription content, log in via an institution.

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351. https://doi.org/10.1038/nrg.2016.49

    Article  CAS  PubMed  Google Scholar 

  2. Narum SR, Campbell NR, Meyer KA, Miller MR, Hardy RW (2013) Thermal adaptation and acclimation of ectotherms from differing aquatic climates. Mol Ecol 22:3090–3097. https://doi.org/10.1111/mec.12240

    Article  PubMed  Google Scholar 

  3. Seeb JE, Carvalho G, Hauser L, Naish K, Roberts S, Seeb LW (2011) Single-nucleotide polymorphism (SNP) discovery and applications of SNP genotyping in nonmodel organisms. Mol Ecol Resour 11(Suppl 1):1–8. https://doi.org/10.1111/j.1755-0998.2010.02979.x

    Article  PubMed  Google Scholar 

  4. Morin PA et al (2015) Geographic and temporal dynamics of a global radiation and diversification in the killer whale. Mol Ecol 24:3964–3979. https://doi.org/10.1111/mec.13284

    Article  PubMed  Google Scholar 

  5. Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29:1917–1932. https://doi.org/10.1093/molbev/mss086

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Richards PM, Liu MM, Lowe N, Davey JW, Blaxter ML, Davison A (2013) RAD-Seq derived markers flank the shell colour and banding loci of the Cepaea nemoralis supergene. Mol Ecol 22:3077–3089. https://doi.org/10.1111/mec.12262

  7. Takahashi T, Sota T, Hori M (2013) Genetic basis of male colour dimorphism in a Lake Tanganyika cichlid fish. Mol Ecol 22:3049–3060. https://doi.org/10.1111/mec.12120

    Article  CAS  PubMed  Google Scholar 

  8. Campbell NR, Harmon SA, Narum SR (2015) Genotyping-in-thousands by sequencing (GT-seq): a cost effective SNP genotyping method based on custom amplicon sequencing. Mol Ecol Resour 15:855–867. https://doi.org/10.1111/1755-0998.12357

    Article  CAS  PubMed  Google Scholar 

  9. Aitken N, Smith S, Schwarz C, Morin PA (2004) Single nucleotide polymorphism (SNP) discovery in mammals: a targeted-gene approach. Mol Ecol 13:1423–1431

    Article  CAS  PubMed  Google Scholar 

  10. Morin PA, Luikart G, Wayne RK, SNP Workshop Grp (2004) SNPs in ecology, evolution and conservation. Trends Ecol Evol 19:208–216. https://doi.org/10.1016/j.tree.2004.01.009

    Article  Google Scholar 

  11. Hancock-Hanser B, Frey A, Leslie M, Dutton PH, Archer EI, Morin PA (2013) Targeted multiplex next-generation sequencing: advances in techniques of mitochondrial and nuclear DNA sequencing for population genomics. Mol Ecol Resour 13:254–268. https://doi.org/10.1111/1755-0998.12059

    Article  CAS  PubMed  Google Scholar 

  12. Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 61:717–726. https://doi.org/10.1093/sysbio/sys004

    Article  PubMed  Google Scholar 

  13. Lemmon AR, Emme SA, Lemmon EM (2012) Anchored hybrid enrichment for massively high-throughput phylogenomics. Syst Biol 61:727–744. https://doi.org/10.1093/sysbio/sys049

    Article  CAS  PubMed  Google Scholar 

  14. Eck SH, Benet-Pages A, Flisikowski K, Meitinger T, Fries R, Strom TM (2009) Whole genome sequencing of a single Bos taurus animal for single nucleotide polymorphism discovery. Genome Biol 10:R82. https://doi.org/10.1186/gb-2009-10-8-r82

  15. Pavy N, Gagnon F, Deschenes A, Boyle B, Beaulieu J, Bousquet J (2016) Development of highly reliable in silico SNP resource and genotyping assay from exome capture and sequencing: an example from black spruce (Picea mariana). Mol Ecol Resour 16:588–598. https://doi.org/10.1111/1755-0998.12468

  16. Aslam ML et al (2012) Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo). BMC Genomics 13:391. https://doi.org/10.1186/1471-2164-13-391

  17. Baird NA et al (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376. https://doi.org/10.1371/journal.pone.0003376

    Article  PubMed  PubMed Central  Google Scholar 

  18. Foote AD, Morin PA (2016) Genome-wide SNP data suggests complex ancestry of sympatric North Pacific killer whale ecotypes. Heredity. https://doi.org/10.1038/hdy.2016.54

  19. Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA (2013) Genotyping-by-sequencing in ecological and conservation genomics. Mol Ecol 22:2841–2847. https://doi.org/10.1111/mec.12350

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92. https://doi.org/10.1038/nrg.2015.28

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Koepfli KP, Paten B, Genome KCS, O’Brien SJ (2015) The genome 10K project: a way forward. Annu Rev Anim Biosci 3:57–111. https://doi.org/10.1146/annurev-animal-090414-014900

    Article  CAS  PubMed  Google Scholar 

  22. i5K Consortium (2013) The i5K initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered 104:595–600. https://doi.org/10.1093/jhered/est050

    Article  PubMed Central  Google Scholar 

  23. Schubert M, Lindgreen S, Orlando L (2016) AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes 9:88. https://doi.org/10.1186/s13104-016-1900-2

    Article  PubMed  PubMed Central  Google Scholar 

  24. Korneliussen TS, Albrechtsen A, Nielsen R (2014) ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15:356. https://doi.org/10.1186/s12859-014-0356-4

    Article  PubMed  PubMed Central  Google Scholar 

  25. Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987–2993. https://doi.org/10.1093/bioinformatics/btr509

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. DePristo MA et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498. https://doi.org/10.1038/ng.806

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. McKenna A et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. https://doi.org/10.1101/gr.107524.110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  30. Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  PubMed  PubMed Central  Google Scholar 

  31. Card DC et al (2014) Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies. PLoS One 9:e106649. https://doi.org/10.1371/journal.pone.0106649

    Article  PubMed  PubMed Central  Google Scholar 

  32. Luo R et al (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18. https://doi.org/10.1186/2047-217X-1-18

    Article  PubMed  PubMed Central  Google Scholar 

  33. Simpson JT, Durbin R (2012) Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556. https://doi.org/10.1101/gr.126953.111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:13033997v1 [q-bioGN]

    Google Scholar 

  35. Lounsberry ZT, Brown SK, Collins PW, Henry RW, Newsome SD, Sacks BN (2015) Next-generation sequencing workflow for assembly of nonmodel mitogenomes exemplified with North Pacific albatrosses (Phoebastria spp.) Mol Ecol Resour 15:893–902. https://doi.org/10.1111/1755-0998.12365

  36. Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451. https://doi.org/10.1038/nrg2986

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Cammen KM, Andrews KR, Carroll EL, Foote AD, Humble E, Khudyakov JI, Louis M, McGowen MR, Olsen MT, Van Cise AM (2016) Genomic methods take the plunge: recent advances in high-throughput sequencing of marine mammals. J Hered 107(6):481–495

    Google Scholar 

  38. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kim SY et al (2011) Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 12:231. https://doi.org/10.1186/1471-2105-12-231

    Article  PubMed  PubMed Central  Google Scholar 

  40. Skotte L, Korneliussen TS, Albrechtsen A (2012) Association testing for next-generation sequencing data using score statistics. Genet Epidemiol 36:430–437. https://doi.org/10.1002/gepi.21636

    Article  PubMed  Google Scholar 

  41. Nielsen R (2004) Population genetic analysis of ascertained SNP data. Hum Genomics 1:218–224

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R (2005) Ascertainment bias in studies of human genome-wide polymorphism. Genome Res 15:1496–1502. https://doi.org/10.1101/gr.4107905

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Excoffier L, Dupanloup I, Huerta-Sanchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PLoS Genet 9:e1003905. https://doi.org/10.1371/journal.pgen.1003905

    Article  PubMed  PubMed Central  Google Scholar 

  44. Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Res 20:393–402. https://doi.org/10.1101/gr.100545.109

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Durvasula A, Hoffman PJ, Kent TV, Liu C, Kono TJ, Morrell PL, Ross-Ibarra J (2016) ANGSD-wrapper: utilities for analyzing next generation sequencing data. Mol Ecol Resour. https://doi.org/10.1111/1755-0998.12578

Download references

Acknowledgments

We are grateful to Lisa Komoroske for helpful comments on the manuscript. Blue whale DNA sequencing was generously provided by Tim Harkins and Clarence Lee, Life Technologies, Inc., and by Gerald Pao, Salk Institute for Biological Studies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phillip A. Morin .

Editor information

Editors and Affiliations

Appendices

Appendix 1

Script1_SNP_call_filter.sh

Appendix 2

Script2_generate_genotype_blocks.py

Appendix 3

Script3a_Filter1_mapping_quality.R

Appendix 4

Script3b_Filter2_excessive_coverage.R

Appendix 5

Script4a_Filter1_Remove_poor_mapping_quality.R

Appendix 6

Script4b_Filter2_Remove_excessive_coverage.R

Appendix 7

Script5_Filter3_excessive_individuals.R

Appendix 8

Script6_Filter4_Remove_rare_SNPs.R

Appendix 9

Script7_SNP_call_filter_GATK.sh

Appendix 10

Script8_generate_genotype_blocks_GATK.py

Appendix 11

Script9_Filter6_compare_SNP_datasets.R

Appendix 12

Script10_plotQC.R

Appendix 13

Script11_Filter7_Remove_HWEexcessHet.R

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Morin, P.A., Foote, A.D., Hill, C.M., Simon-Bouhet, B., Lang, A.R., Louis, M. (2018). SNP Discovery from Single and Multiplex Genome Assemblies of Non-model Organisms. In: Head, S., Ordoukhanian, P., Salomon, D. (eds) Next Generation Sequencing. Methods in Molecular Biology, vol 1712. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7514-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7514-3_9

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7512-9

  • Online ISBN: 978-1-4939-7514-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics