The Draft Genome of the MD-2 Pineapple

  • Raimi M. Redwan
  • Akzam Saidin
  • Subbiah V. KumarEmail author
Part of the Plant Genetics and Genomics: Crops and Models book series (PGG, volume 22)


With the advancement in sequencing technology, it is now possible to decode complex plant genomes with high accuracy. For many years, short-read sequencers were the dominant reads used for assembling genomes until the introduction of third-generation long-read sequencing machines. Long reads are able to extend through complex regions of repeats avoiding erroneous collapse which causes a reduction in the genome assembly size. However, the low accuracy of the long reads is a cause of concern, and this hinders its direct application in de novo assemblies of large genomes. Here, we report on the whole-genome assembly of the MD-2 pineapple using a hybrid sequencing approach. We used the Illumina short reads to correct the systematic errors of the long PacBio reads. The error-corrected long reads were then used to de novo assemble the MD-2 pineapple genome using multiple assembly software and strategies. The most optimal accuracy and contiguity were achieved in the de novo assembly of error-corrected long reads using Celera. The MD-2 pineapple genome achieved a N50 of 153,084 bp with 8448 scaffolds and a total assembly size of 524.07 Mb. In addition, 245 out of the 248 ultra-conserved CEGs were found in the genome, indicating completeness of more than 98%. Furthermore, 87% of the mapped transcripts were identified in the genome with coverages of more than 90%, while another 12% were mapped with coverages of more than 80%. This MD-2 pineapple genome provides a high-quality draft for gene prediction and further downstream applications in pineapple.


Pineapple Plant genome sequencing Hybrid assembly Sequencing technology Heterozygous genome 



We thank Hydayaty Yusoff and the Pineapple Board of Malaysia for the pineapple sample, Caroline Chan from Pacific Biosciences (Asia Pacific) and Dana Chow from TreeCode Sdn Bhd for assistance with the Pacific Biosciences RSII, and Novocraft Sdn. Bhd. for the computing facility used in this project. This project is funded by the Ministry of Education and the Ministry of Science, Technology and Innovation, Malaysia, through the Fundamental Research Grant Scheme (FRG0319-SG-2013) and Science Fund (SCF0087-BIO-2013), respectively.


  1. Al-Mssallem IS, Hu S, Zhang X, Lin Q, Liu W, Tan J, Yu X et al (2013) Genome sequence of the date palm Phoenix dactylifera L. Nat Commun 4:1–9CrossRefGoogle Scholar
  2. Argout X, Salse J, Aury J-M, Guiltinan MJ, Droc G, Gouzy J, Allegre M et al (2011) The genome of Theobroma cacao. Nat Genet 43:101–108CrossRefGoogle Scholar
  3. Chagné D, Crowhurst RN, Pindo M, Thrimawithana A, Deng C, Ireland H, Fiers M et al (2014) The draft genome sequence of European pear (Pyrus communis L. “Bartlett”). PLoS One 9:e92644CrossRefGoogle Scholar
  4. Chen YC, Liu T, Yu CH, Chiang TY, Hwang CC (2013) Effects of GC Bias in next-generation-sequencing data on de novo genome assembly. PLoS One 8(4):e62856CrossRefGoogle Scholar
  5. Cheung F, Haas BJ, Goldberg S, May GD, Xiao Y, Town CD (2006) Sequencing Medicago truncatula expressed sequenced tags using 454 life sciences technology. BMC Genomics 7:1–10CrossRefGoogle Scholar
  6. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A et al (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569CrossRefGoogle Scholar
  7. Devine SE, Chissoe SL, Eby Y, Wilson RK, Boeke JD (1997) A transposon-based strategy for sequencing repetitive DNA in eukaryotic genomes. Genome Res 7:551–563CrossRefGoogle Scholar
  8. Earl D, Bradnam K, John JS, Darling A, Lin D, Fass J, On H et al (2011) Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res 21:2224–2241CrossRefGoogle Scholar
  9. El-Metwally S, Ouda OM, Helmy M (2014) Assessment of next-generation sequence assembly. In: Next generation sequencing technologies and challenges in sequence assembly. Springer, New York, NY, pp 95–101CrossRefGoogle Scholar
  10. English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X et al (2012) Mind the gap: upgrading genomes with Pacific biosciences RS long-read sequencing technology. PLoS One 7:e47768CrossRefGoogle Scholar
  11. Ferrarini M, Moretto M, Ward J a, Šurbanovski N, Stevanović V, Giongo L, Viola R et al (2013) An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics 14:670CrossRefGoogle Scholar
  12. Fierst JL (2015) Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools. Front Genet 6:1–8CrossRefGoogle Scholar
  13. Gui Y, Wang S, Quan L, Zhou C, Long S, Zheng H, Jin L, Zhang X, Ma N, Fan L (2007) Genome size and sequence composition of moso bamboo: a comparative study. Sci China C Life Sci 50:700–705CrossRefGoogle Scholar
  14. He B, Caudy A, Parsons L, Rosebrock A, Pane A, Raj S, Wieschaus E (2012) Mapping the pericentric heterochromatin by comparative genomic hybridization analysis and chromosome deletions in Drosophila melanogaster. Genome Res 22:2507–2519CrossRefGoogle Scholar
  15. Hercus C (2015) novoLR package. In: Novocraft Technologies Sdn. Bhd. Kuala Lumpur, MalaysiaGoogle Scholar
  16. Hirakawa H, Okada Y, Tabuchi H, Shirasawa K, Watanabe A, Tsuruoka H, Minami C et al (2015) Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res 22:171–179CrossRefGoogle Scholar
  17. Hoskins RA, Smith CD, Carlson JW, Bernardo A, Halpern A, Kaminker JS, Kennedy C et al (2002) Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol 3:1–16CrossRefGoogle Scholar
  18. Huddleston J, Ranade S, Malig M, Antonacci F, Chaisson M, Hon L, Sudmant PH et al (2014) Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res 24:688–696CrossRefGoogle Scholar
  19. Kang YJ, Satyawan D, Shim S, Lee T, Lee J, Hwang WJ, Kim SK et al (2015) Draft genome sequence of adzuki bean, Vigna angularis. Sci Rep 5:8069CrossRefGoogle Scholar
  20. Koren S, Harhay GP, Smith TPL, Bono JL, Harhay DM, Mcvey S, Radune D, Bergman NH, Phillippy AM (2012a) Reducing assembly complexity of microbial genomes with single-molecule sequencing. Nat Biotechnol 30:693–700CrossRefGoogle Scholar
  21. Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z et al (2012b) Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 30:693–700CrossRefGoogle Scholar
  22. Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:1–7CrossRefGoogle Scholar
  23. Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P et al (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452:991–996CrossRefGoogle Scholar
  24. Ming R, VanBuren R, Wai CM, Tang H, Schatz MC, Bowers JE, Lyons E et al (2015) The pineapple genome and the evolution of CAM photosynthesis. Nat Genet 47:1435–1442CrossRefGoogle Scholar
  25. Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA et al (2000) A whole-genome assembly of Drosophila. Science 287:2196–2204CrossRefGoogle Scholar
  26. Natsume S, Takagi H, Shiraishi A, Murata J, Toyonaga H, Patzak J, Takagi M et al (2014) The draft genome of Hop (Humulus lupulus), an essence for brewing. Plant Cell Physiol 56(3):428–441CrossRefGoogle Scholar
  27. Pendleton M, Sebra R, Pang AWC, Ummat A, Franzen O, Rausch T, Stütz AM et al (2015) Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods 12:780–786CrossRefGoogle Scholar
  28. Peng Y, Lai Z, Lane T, Nageswara-Rao M, Okada M, Jasieniuk M, O’Geen H et al (2014) De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms. Plant Physiol 166:1241–1254CrossRefGoogle Scholar
  29. Peng Z, Lu Y, Li L, Zhao Q, Feng Q, Gao Z, Lu H et al (2013) The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla). Nat Genet 45:456–461CrossRefGoogle Scholar
  30. Plomion C, Aury J-M, Amselem J, Alaeitabar T, Barbe V, Belser C, Bergès H et al (2016) Decoding the oak genome: public release of sequence data, assembly, annotation and publication strategies. Mol Ecol Resour 16(1):254–265CrossRefGoogle Scholar
  31. Ribeiro FJ, Przybylski D, Yin S, Sharpe T, Gnerre S, Abouelleil A, Berlin AM et al (2012) Finished bacterial genomes from shotgun sequence data. Genome Res 22:2270–2277CrossRefGoogle Scholar
  32. Salzberg SL, Yorke JA (2005) Beware of mis-assembled genomes. Bioinformatics 21:4320–4321CrossRefGoogle Scholar
  33. Schatz MC, Witkowski J, McCombie WR (2012) Current challenges in de novo plant genome sequencing and assembly. Genome Biol 13:243CrossRefGoogle Scholar
  34. Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J et al (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46:707–713CrossRefGoogle Scholar
  35. Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL, Jaiswal P et al (2011) The genome of woodland strawberry (Fragaria vesca). Nat Genet 43:109–116CrossRefGoogle Scholar
  36. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & gray). Science 313:1596–1604CrossRefGoogle Scholar
  37. Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MTA et al (2011) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30:83–89CrossRefGoogle Scholar
  38. Varshney RK, Song C, Saxena RK, Azam S, Yu S, Sharpe AG, Cannon S et al (2013) Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat Biotechnol 31:240–246CrossRefGoogle Scholar
  39. Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P et al (2010) The genome of the domesticated apple (Malus × domestica Borkh.). Nat Genet 42:833–839CrossRefGoogle Scholar
  40. Vinson JP, Jaffe DB, O’Neill K, Karlsson EK, Stange-Thomann N, Anderson S, Mesirov JP et al (2005) Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. Genome Res 15:1127–1135CrossRefGoogle Scholar
  41. Wang W, Feng B, Xiao J, Xia Z, Zhou X, Li P, Zhang W et al (2014) Cassava genome from a wild ancestor to cultivated varieties. Nat Commun 5:5110CrossRefGoogle Scholar
  42. Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L et al (2012a) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473CrossRefGoogle Scholar
  43. Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z et al (2012b) The draft genome of a diploid cotton Gossypium raimondii. Nat Genet 44:1098–1103CrossRefGoogle Scholar
  44. Xu Q, Chen L-L, Ruan X, Chen D, Zhu A, Chen C, Bertrand D et al (2013) The draft genome of sweet orange (Citrus sinensis). Nat Genet 45:59–66CrossRefGoogle Scholar
  45. Yan L, Wang X, Liu H, Tian Y, Lian J, Yang R, Hao S et al (2015) The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb. Mol Plant 8:922–934CrossRefGoogle Scholar
  46. Ye C, Hill CM, Wu S, Ruan J, Ma Z (2016) DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci Rep 6:31900CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Raimi M. Redwan
    • 1
    • 2
  • Akzam Saidin
    • 3
  • Subbiah V. Kumar
    • 2
    Email author
  1. 1.Faculty of Agro-Based IndustryUniversiti Malaysia KelantanJeliMalaysia
  2. 2.Biotechnology Research InstituteUniversiti Malaysia SabahKota KinabaluMalaysia
  3. 3.Novocraft Technology Sdn. BhdPetaling JayaMalaysia

Personalised recommendations