Skip to main content

Sequencing, Assembly, and Annotation of the Soybean Genome

  • Chapter
  • First Online:
The Soybean Genome

Part of the book series: Compendium of Plant Genomes ((CPG))

Abstract

Genome sequencing yields an exceptional resource of genetic information. Knowledge of whole genome sequence information helps characterize individual genomes, transcriptional states and genetic variation in populations and provide genetic architecture associated with each trait. After the release of the first human genome assembly, other model organism assemblies became available; including the model plant Arabidopsis thaliana. The soybean community published the first reference genome of the variety Williams 82 in 2010. Soybean has important syntenic relationships with the other legume species and is a model plant for the legumes. In this chapter, we discuss about the soybean genome assemblies and annotations, and its fine-tuning in view of the next-generation sequencing technologies and bioinformatics tools. In addition, comparison of the structural variations between the cultivated reference genome with the available wild soybean genome information will be discussed. This is followed by the discussion on the opportunities of next-generation sequencing technologies and challenges that we anticipate on the development of more pangenomes and reference genomes for soybean. This will significantly affect the discovery of rare alleles associated with key agronomic and quality traits and shape up the next-generation breeding technologies and crop improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9:208–219

    Article  CAS  Google Scholar 

  • Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S et al (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res 12(1):177–189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bennetzen JL, Wang H (2014) The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol 65:505–530

    Article  CAS  PubMed  Google Scholar 

  • Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK et al (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chaisson MJ, Pevzner PA (2008) Short read fragment assembly of bacterial genomes. Genome Res 18:324–330

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • de la Chaux N, Tsuchimatsu T, Shimizu KK, Wagner A (2012) The predominantly selfing plant Arabidopsis thaliana experienced a recent reduction in transposable element abundance compared to its outcrossing relative Arabidopsis lyrata. Mob DNA 3:2

    Article  PubMed  PubMed Central  Google Scholar 

  • Eid J, Fehr A, Gray J, Luong K, Luong K et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138

    Article  CAS  PubMed  Google Scholar 

  • Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T et al (2009) The challenges of sequencing by synthesis. Nat Biotechnol 27:1013–1023

    Article  CAS  PubMed  Google Scholar 

  • Galindo-González L, Mhiri C, Deyholos MK, Grandbastien MA (2017) LTR-retrotransposons in plants: engines of evolution. Gene S0378–1119(17):30322

    Google Scholar 

  • Goldberg RB (1978) DNA sequence organization in the soybean plant. Biochem Genet 16:45–51

    Article  CAS  PubMed  Google Scholar 

  • Goldblatt P (1981) Cytology and phylogeny of Leguminosae. In: Polhill RM, Raven PH (eds) Advances in legume systematics, part 2. Royal Botanic Gardens, Kew, pp 427–463

    Google Scholar 

  • Golicz AA, Batley J, Edwards D (2016) Towards plant pangenomics. Plant Biotechnol J 14:1099–1105

    Article  PubMed  Google Scholar 

  • Ha J, Abernathy B, Nelson W, Grant D, Wu X et al (2012) Integration of the draft sequence and physical map as a framework for genomic research in soybean (Glycine max (L.) Merr.) and wild soybean (Glycine soja Sieb. and Zucc.). Genes Genomes Genet (Bethesda) 2:321–329

    CAS  Google Scholar 

  • Hashmi U, Shafqat S, Khan F, Majid M, Hussain H et al (2015) Plant exomics: concepts, applications and methodologies in crop improvement. Plant Signal Behav 10(1):e976152

    Article  PubMed  Google Scholar 

  • Hernandez D, Francois P, Farinelli L, Osterås M, Schrenzel J (2008) De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 18:802–809

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hymowitz T (1970) On the domestication of soybean. Econ Bot 24:408–421

    Article  Google Scholar 

  • Hymowitz T (2004) Speciation and cytogentics. In Boerma HR, Specht JE (eds) Soybeans: improvement, production and uses. American Society of Agronomy, Madison, pp 97–136

    Google Scholar 

  • Imelfort M, Edwards D (2009) De novo sequencing of plant genomes using second-generation technologies. Brief Bioinform 10:609–618

    Article  CAS  PubMed  Google Scholar 

  • International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800

    Article  Google Scholar 

  • Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S et al (2017) ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res. doi:10.1101/gr.214346.116

    PubMed  PubMed Central  Google Scholar 

  • Jaffe DB, Butler J, Gnerre S, Mauceli E, Lindblad-Toh K et al (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res 13:91–96

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V et al (2007) Extending assembly of short DNA sequences to handle error. Bioinformatics 23:2942–2944

    Article  CAS  PubMed  Google Scholar 

  • Jiao WB, Accinelli G, Hartwig B, Kiefer C, Baker D et al (2017a) Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res 27:778–786

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jiao WB, Accinelli G, Hartwig B, Kiefer C, Baker D et al (2017b) Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. doi:10.1101/gr.213652.116

    PubMed  PubMed Central  Google Scholar 

  • Joshi T, Valliyodan B, Wu JH, Lee SH, Xu D, Nguyen HT (2013) Genomic differences between cultivated soybean, G. max and its wild relative G. soja. BMC Genomics 14(Suppl 1):S5

    Google Scholar 

  • Keim P, Diers BW, Olson TC, Shoemaker RC (1990) RFLP mapping in soybean: association between marker loci and variation in quantitative traits. Genetics 126:735–742

    CAS  PubMed  PubMed Central  Google Scholar 

  • Keim P, Schupp JM, Travis SE, Clayton K, Zhu T et al (1997) A high density soybean genetic map based on AFLP markers. Crop Sci 37:537–543

    Article  CAS  Google Scholar 

  • Kim MY, Lee S, Van K, Kim TH, Jeong SC et al (2010) Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc Natl Acad Sci USA 107:22032–22037

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lander ES, Waterman MS (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231–239

    Article  CAS  PubMed  Google Scholar 

  • Li R, Zhu H, Ruan J, Qian W, Fang X et al (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li YH, Zhou G, Ma J, Jiang W, Jin LG et al (2014) De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol 10:1045–1052

    Article  Google Scholar 

  • Libault M, Farmer A, Joshi T, Takahashi K, Langley RJ et al (2010) An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants. Plant J 63:86–99

    CAS  PubMed  Google Scholar 

  • Luo MC, Thomas C, You FM, Hsiao J, Ouyang S et al (2003) High-throughput fingerprinting of bacterial artificial chromosomes using the SNaPshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics 82:378–389

    Article  CAS  PubMed  Google Scholar 

  • Marek LF, Mudge J, Darnielle L, Grant D, Hanson N et al (2001) Soybean genomic survey: BAC-end sequences near RFLP and SSR markers. Genome 44:572–581

    Article  CAS  PubMed  Google Scholar 

  • Margulies M, Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380

    CAS  PubMed  PubMed Central  Google Scholar 

  • Myers EW (1995) Toward simplifying and accurately formulating fragment assembly. J Comput Biol 2:275–290

    Article  CAS  PubMed  Google Scholar 

  • Nakano K, Shiroma A, Shimoji M, Tamotsu H, Ashimine N et al (2017) Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area. Hum Cell. doi:10.1007/s13577-017-0168-8

  • Pevzner PA, Tang H, Waterman MS (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98:9748–9753

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Qi X, Li MW, Xie M, Liu X, Ni M et al (2014) Identification of a novel salt tolerance gene in wild soybean by whole-genome sequencing. Nat Commun 5:4340

    CAS  PubMed  PubMed Central  Google Scholar 

  • Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genom Proteom Bioinform 13:278–289

    Article  Google Scholar 

  • Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR et al (1977) Nucleotide sequence of bacteriophage φX174 DNA. Nature 24:687–695

    Article  Google Scholar 

  • Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T et al (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183

    Article  CAS  PubMed  Google Scholar 

  • Schnable PS, Ware D, Fulton RS, Stein JC, Wei F et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115

    Article  CAS  PubMed  Google Scholar 

  • Schwartz DC, Li X, Hernandez LI, Ramnarain SP, Huff EJ, Wang YK (1993) Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262:110–114

    Article  CAS  PubMed  Google Scholar 

  • Severin AJ, Woody JL, Bolon YT, Joseph B, Diers BW et al (2010) RNA-Seq atlas of Glycine max: a guide to the soybean transcriptome. BMC Plant Biol 10:160

    Article  PubMed  PubMed Central  Google Scholar 

  • Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP et al (2005) Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309:1728–1732

    Article  CAS  PubMed  Google Scholar 

  • Shoemaker R, Keim P, Vodkin L, Retzel E, Clifton SW et al (2002) A compilation of soybean ESTs: generation and analysis. Genome 45:329–338

    Article  PubMed  Google Scholar 

  • Shoemaker RC, Grant D, Olson T, Warren WC, Wing R et al (2008) Microsatellite discovery from BAC end sequences and genetic mapping to anchor the soybean physical and genetic maps. Genome 51:294–302

    Article  CAS  PubMed  Google Scholar 

  • Shultz JL, Ray JD, Lightfoot DA (2007) A sequence based synteny map between soybean and Arabidopsis thaliana. BMC Genom 8:8

    Article  Google Scholar 

  • Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Singh RJ, Hymowitz T (1988) The genomic relationship between Glycine max (L.) Merr. and G. soja Sieb. and Zucc. as revealed by pachytene chromosome analysis. Theor Appl Genet 76:705–711

    Article  CAS  PubMed  Google Scholar 

  • Song Q, Jenkins J, Jia G, Hyten DL, Pantalone V et al (2016) Construction of high resolution genetic linkage maps to improve the soybean genome sequence assembly Glyma1.01. BMC Genom 17:33

    Article  Google Scholar 

  • Sutton GG, White O, Adams MD, Kerlavage AR (1995) TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci Technol 1:9–19

    Article  CAS  Google Scholar 

  • Takata M, Kiyohara A, Takasu A, Kishima Y, Ohtsubo H, Sano Y (2007) Rice transposable elements are characterized by various methylation environments in the genome. BMC Genom 8:469

    Article  Google Scholar 

  • Tang H, Lyons E, Town CD (2015) Optical mapping in plant comparative genomics. Gigascience 4:3

    Article  PubMed  PubMed Central  Google Scholar 

  • The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815

    Article  Google Scholar 

  • The International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945

    Article  Google Scholar 

  • United States Department of Agriculture-Foreign Agricultural Service (2017) World agricultural production, Circular Series WAP 05-17

    Google Scholar 

  • Utturkar SM, Klingeman DM, Land ML, Schadt CW, Doktycz MJ, Pelletier DA, Brown SD (2014) Evaluation and validation of de novo and hybrid assembly techniques to derive high-quality genome sequences. Bioinformatics 30:2709–2716

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Valliyodan B, Qiu D, Patil G, Zeng P, Huang J et al (2016) Landscape of genomic diversity and trait discovery in soybean. Sci Rep 6:23598

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27(9):522–530

    Article  CAS  PubMed  Google Scholar 

  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ et al (2001) The sequence of the human genome. Science 291:1304–1351

    Article  CAS  PubMed  Google Scholar 

  • Wang Z, Libault M, Joshi T, Valliyodan B, Nguyen HT et al (2010) SoyDB: a knowledge database of soybean transcription factors. BMC Plant Biol 10:14

    Article  PubMed  PubMed Central  Google Scholar 

  • Warren WC, The Soybean Mapping Consortium (2006) A physical map of the “Williams 82” soybean (Glycine max) genome. Abstract W151. In: Plant and animal genomes XIV conference, San Diego, CA, 14–18 Jan 2006

    Google Scholar 

  • Warren RL, Sutton GG, Jones SJM, Holt RA (2007) Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23:500–501

    Article  CAS  PubMed  Google Scholar 

  • Weissensteiner MH, Pang AWC, Bunikis I, Höijer I, Vinnere-Petterson O, Suh A, Wolf JBW (2017) Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications. Genome Res. doi:10.1101/gr.215095.116

    PubMed  Google Scholar 

  • Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT (2010) SNP discovery by high-throughput sequencing in soybean. BMC Genom 11:469

    Article  Google Scholar 

  • Wu X, Vuong TD, Leroy JA, Shannon GJ, Sleper DA, Nguyen HT (2011) Selection of a core set of RILs from Forrest × Williams 82 to develop a framework map in soybean. Theor Appl Genet 122:1179–1187

    Article  PubMed  PubMed Central  Google Scholar 

  • Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the United Soybean Board and the United States Department of Agriculture for project support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Babu Valliyodan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Valliyodan, B., Lee, SH., Nguyen, H.T. (2017). Sequencing, Assembly, and Annotation of the Soybean Genome. In: Nguyen, H., Bhattacharyya, M. (eds) The Soybean Genome. Compendium of Plant Genomes. Springer, Cham. https://doi.org/10.1007/978-3-319-64198-0_5

Download citation

Publish with us

Policies and ethics