De novo assembly of white poplar genome and genetic diversity of white poplar population in Irtysh River basin in China

  • Yan-Jing Liu
  • Xiao-Ru Wang
  • Qing-Yin ZengEmail author
Research Paper


The white poplar (Populus alba) is widely distributed in Central Asia and Europe. There are natural populations of white poplar in Irtysh River basin in China. It also can be cultivated and grown well in northern China. In this study, we sequenced the genome of P. alba by single-molecule real-time technology. De novo assembly of P. alba had a genome size of 415.99 Mb with a contig N50 of 1.18 Mb. A total of 32,963 protein-coding genes were identified. 45.16% of the genome was annotated as repetitive elements. Genome evolution analysis revealed that divergence between P. alba and Populus trichocarpa (black cottonwood) occurred ~5.0 Mya (3.0, 7.1). Fourfold synonymous third-codon transversion (4DTV) and synonymous substitution rate (ks) distributions supported the occurrence of the salicoid WGD event (~ 65 Mya). Twelve natural populations of P. alba in the Irtysh River basin in China were sequenced to explore the genetic diversity. Average pooled heterozygosity value of P. alba populations was 0.170±0.014, which was lower than that in Italy (0.271±0.051) and Hungary (0.264±0.054). Tajima’s D values showed a negative distribution, which might signify an excess of low frequency polymorphisms and a bottleneck with later expansion of P. alba populations examined.


Populus alba de novo assembly genetic diversity population expansion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



We thank Dr. Jian Wang for assisting with the population sampling from Irtysh River basin. This work was supported by the National Science Fund for Distinguished Young Scholars (31425006) and Chinese Academy of Forestry (CAFYBB2018ZX001).

Supplementary material

11427_2018_9455_MOESM1_ESM.pdf (191 kb)
Supplementary material, approximately 192 KB.
11427_2018_9455_MOESM2_ESM.xls (34 kb)
Supplementary material, approximately 34 KB.
11427_2018_9455_MOESM3_ESM.xls (30 kb)
Supplementary material, approximately 30 KB.
11427_2018_9455_MOESM4_ESM.xls (25 kb)
Supplementary material, approximately 25 KB.
11427_2018_9455_MOESM5_ESM.xls (28 kb)
Supplementary material, approximately 29 KB.
11427_2018_9455_MOESM6_ESM.xls (25 kb)
Supplementary material, approximately 25 KB.
11427_2018_9455_MOESM7_ESM.xls (25 kb)
Supplementary material, approximately 25 KB.
11427_2018_9455_MOESM8_ESM.xls (30 kb)
Supplementary material, approximately 30 KB.
11427_2018_9455_MOESM9_ESM.xls (54 kb)
Supplementary material, approximately 54 KB.
11427_2018_9455_MOESM10_ESM.xls (52 kb)
Supplementary material, approximately 52 KB.


  1. Alexa, A., and Rahnenfuhrer, J. (2010). topGO: Enrichment Analysis for Gene Ontology. R package version 2.30.1.Google Scholar
  2. Argus, G.W., Eckenwalder, J.E., Kiger, R.W. (2010). Salicaceae. In Flora of North America, Flora of North America Editorial Committee, ed. vol. 7. (New York: Oxford University Press).Google Scholar
  3. Bairoch, A., and Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28, 45–48.CrossRefGoogle Scholar
  4. Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11.CrossRefGoogle Scholar
  5. Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580.CrossRefGoogle Scholar
  6. Biswas, S., and Akey, J.M. (2006). Genomic insights into positive selection. Trends Genets 22, 437–446.CrossRefGoogle Scholar
  7. Brundu, G., Lupi, R., Zapelli, I., Fossati, T., Patrignani, G., Camarda, I., Sala, F., and Castiglione, S. (2008). The origin of clonal diversity and structure of Populus alba in Sardinia: evidence from nuclear and plastid microsatellite markers. Ann Bot 102, 997–1006.CrossRefGoogle Scholar
  8. Chan, A.P., Crabtree, J., Zhao, Q., Lorenzi, H., Orvis, J., Puiu, D., Melake-Berhan, A., Jones, K.M., Redman, J., Chen, G., et al. (2010). Draft genome sequence of the oilseed species Ricinus communis. Nat Biotechnol 28, 951–956.CrossRefGoogle Scholar
  9. Chen, C., Khaleel, S.S., Huang, H., and Wu, C.H. (2014). Software for preprocessing Illumina next-generation sequencing short read sequences. Source Code Biol Med 9, 8.CrossRefGoogle Scholar
  10. Christe, C., Stölting, K.N., Bresadola, L., Fussi, B., Heinze, B., Wegmann, D., and Lexer, C. (2016). Selection against recombinant hybrids maintains reproductive isolation in hybridizing Populus species despite F1 fertility and recurrent gene flow. Mol Ecol 25, 2482–2498.CrossRefGoogle Scholar
  11. Christe, C., Stölting, K.N., Paris, M., Fraїsse, C., Bierne, N., and Lexer, C. (2017). Adaptive evolution and segregating load contribute to the genomic landscape of divergence in two tree species connected by episodic gene flow. Mol Ecol 26, 59–76.CrossRefGoogle Scholar
  12. Dai, X., Hu, Q., Cai, Q., Feng, K., Ye, N., Tuskan, G.A., Milne, R., Chen, Y., Wan, Z., Wang, Z., et al. (2014). The willow genome and divergent evolution from poplar after the common genome duplication. Cell Res 24, 1274–1277.CrossRefGoogle Scholar
  13. DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498.CrossRefGoogle Scholar
  14. Edgar, R.C., and Myers, E.W. (2005). PILER: identification and classification of genomic repeats. Bioinformatics 21, i152–i158.CrossRefGoogle Scholar
  15. Emms, D.M., and Kelly, S. (2015). OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157.CrossRefGoogle Scholar
  16. EUFORGEN. (1999). Populus nigra network: Report of the fifth meeting..Google Scholar
  17. Fang, C., Zhao, S., Skvortsov, A. (1999). Salicaceae. In Flora of China, Z. Y. Wu, P.H. Raven, D.Y. Hong, ed. vol. 4. (Beijing: Science Press; St. Louis, MO: Missouri Botanical Garden Press).Google Scholar
  18. Ferreira, S., Hjernø, K., Larsen, M., Wingsle, G., Larsen, P., Fey, S., Roepstorff, P., and Salomé Pais, M. (2006). Proteome profiling of Populus euphratica Oliv. upon heat stress. Ann Bot 98, 361–377.CrossRefGoogle Scholar
  19. Fussi, B., Lexer, C., and Heinze, B. (2010). Phylogeography of Populus alba (L.) and Populus tremula (L.) in Central Europe: secondary contact and hybridisation during recolonisation from disconnected refugia. Tree Genets Genomes 6, 439–450.CrossRefGoogle Scholar
  20. Götz, S., García-Gómez, J.M., Terol, J., Williams, T.D., Nagaraj, S.H., Nueda, M.J., Robles, M., Talón, M., Dopazo, J., and Conesa, A. (2008). High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36, 3420–3435.CrossRefGoogle Scholar
  21. Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R., and Wortman, J.R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, R7.Google Scholar
  22. Hamzeh, M., and Dayanandan, S. (2004). Phylogeny of Populus (Salicaceae) based on nucleotide sequences of chloroplast TRNTTRNF region and nuclear rDNA. Am J Bot 91, 1398–1408.CrossRefGoogle Scholar
  23. Han, M.V., Thomas, G.W.C., Lugo-Martinez, J., and Hahn, M.W. (2013). Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 30, 1987–1997.CrossRefGoogle Scholar
  24. Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M. (2016). BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767–769.CrossRefGoogle Scholar
  25. Verde, I., Abbott, A.G., Scalabrin, S., Jung, S., Shu, S., Marroni, F., Zhebentyayeva, T., Dettori, M.T., Grimwood, J., Cattonaro, F., et al. (2013). The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45, 487–494.CrossRefGoogle Scholar
  26. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44, D457–D462.CrossRefGoogle Scholar
  27. Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.CrossRefGoogle Scholar
  28. Kofler, R., Orozco-terWengel, P., De Maio, N., Pandey, R.V., Nolte, V., Futschik, A., Kosiol, C., and Schlötterer, C. (2011). PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925.CrossRefGoogle Scholar
  29. Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., Garcia-Hernandez, M., et al. (2012). The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40, D1202–D1210.CrossRefGoogle Scholar
  30. Lexer, C., Fay, M.F., Joseph, J.A., Nica, M.S., and Heinze, B. (2005). Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14, 1045–1057.CrossRefGoogle Scholar
  31. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.CrossRefGoogle Scholar
  32. Lin, Y.C., Wang, J., Delhomme, N., Schiffthaler, B., Sundström, G., Zuccolo, A., Nystedt, B., Hvidsten, T.R., de la Torre, A., Cossu, R.M., et al. (2018). Functional and evolutionary genomic inferences in Populus through genome and population sequencing of American and European aspen. Proc Natl Acad Sci USA 115, e10970–E10978.CrossRefGoogle Scholar
  33. Ma, T., Wang, J., Zhou, G., Yue, Z., Hu, Q., Chen, Y., Liu, B., Qiu, Q., Wang, Z., Zhang, J., et al. (2013). Genomic insights into salt adaptation in a desert poplar. Nat Commun 4, 2797.CrossRefGoogle Scholar
  34. Motamayor, J.C., Mockaitis, K., Schmutz, J., Haiminen, N., Livingstone, D., Cornejo, O., Findley, S.D., Zheng, P., Utro, F., Royaert, S., et al. (2013). The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol 14, r53.CrossRefGoogle Scholar
  35. Myburg, A.A., Grattapaglia, D., Tuskan, G.A., Hellsten, U., Hayes, R.D., Grimwood, J., Jenkins, J., Lindquist, E., Tice, H., Bauer, D., et al. (2014). The genome of Eucalyptus grandis. Nature 510, 356–362.CrossRefGoogle Scholar
  36. Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., et al. (2007). The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res 35, D883–D887.CrossRefGoogle Scholar
  37. Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067.CrossRefGoogle Scholar
  38. Roiron, P., Ali, A.A., Guendon, J.L., Carcaillet, C., and Terral, J.F. (2004). Preuve de l'indigénat de Populus alba L. dans le Bassin méditerranéen occidental. Comptes Rendus Biologies 327, 125–132.CrossRefGoogle Scholar
  39. Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al. (2010). Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183.CrossRefGoogle Scholar
  40. Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212.CrossRefGoogle Scholar
  41. Singh, R., Ming, R., and Yu, Q. (2016). Comparative analysis of GC content variations in plant genomes. Tropical Plant Biol 9, 136–149.CrossRefGoogle Scholar
  42. Smit, A., Hubley, R., and Green, P. (2013–2015). RepeatMasker Open-4.0 ( Scholar
  43. Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313.CrossRefGoogle Scholar
  44. Stölting, K.N., Nipper, R., Lindtke, D., Caseys, C., Waeber, S., Castiglione, S., and Lexer, C. (2013). Genomic scan for single nucleotide polymorphisms reveals patterns of divergence and gene flow between ecologically divergent species. Mol Ecol 22, 842–855.CrossRefGoogle Scholar
  45. Stölting, K.N., Paris, M., Meier, C., Heinze, B., Castiglione, S., Bartha, D., and Lexer, C. (2015). Genome-wide patterns of differentiation and spatially varying selection between postglacial recolonization lineages of Populus alba (Salicaceae), a widespread forest tree. New Phytol 207, 723–734.CrossRefGoogle Scholar
  46. Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595.Google Scholar
  47. Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111.CrossRefGoogle Scholar
  48. Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511–515.CrossRefGoogle Scholar
  49. Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604.CrossRefGoogle Scholar
  50. Van de Peer, Y., Fawcett, J.A., Proost, S., Sterck, L., and Vandepoele, K. (2009a). The flowering world: a tale of duplications. Trends Plant Sci 14, 680–688.CrossRefGoogle Scholar
  51. Van de Peer, Y., Maere, S., and Meyer, A. (2009b). The evolutionary significance of ancient genome duplications. Nat Rev Genet 10, 725–732.CrossRefGoogle Scholar
  52. Van der Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al. (2013). From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 11, 11.10.11-11.10.33.Google Scholar
  53. Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40, e49.CrossRefGoogle Scholar
  54. Wu, G.A., Prochnik, S., Jenkins, J., Salse, J., Hellsten, U., Murat, F., Perrier, X., Ruiz, M., Scalabrin, S., Terol, J., et al. (2014). Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 32, 656–662.CrossRefGoogle Scholar
  55. Xu, Z., and Wang, H. (2007). LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35, W265–W268.CrossRefGoogle Scholar
  56. Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586–1591.CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.State Key Laboratory of Tree Genetics and BreedingChinese Academy of ForestryBeijingChina
  2. 2.State Key Laboratory of Systematic and Evolutionary Botany, Institute of BotanyChinese Academy of SciencesBeijingChina
  3. 3.Department of Ecology and Environmental Science, UPSCUmeå UniversityUmeåSweden

Personalised recommendations