, Volume 248, Issue 5, pp 1307–1318 | Cite as

A deep convolutional neural network approach for predicting phenotypes from genotypes

  • Wenlong Ma
  • Zhixu Qiu
  • Jie Song
  • Jiajia Li
  • Qian Cheng
  • Jingjing Zhai
  • Chuang MaEmail author
Original Article


Main conclusion

Deep learning is a promising technology to accurately select individuals with high phenotypic values based on genotypic data.


Genomic selection (GS) is a promising breeding strategy by which the phenotypes of plant individuals are usually predicted based on genome-wide markers of genotypes. In this study, we present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypes when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional genotypic data. We used a large GS dataset to train DeepGS and compared its performance with other methods. The experimental results indicate that DeepGS can be used as a complement to the commonly used RR-BLUP in the prediction of phenotypes from genotypes. The complementarity between DeepGS and RR-BLUP can be utilized using an ensemble learning approach for more accurately selecting individuals with high phenotypic values, even for the absence of outlier individuals and subsets of genotypic markers. The source codes of DeepGS and the ensemble learning approach have been packaged into Docker images for facilitating their applications in different GS programs.


Deep learning Ensemble learning Genomic selection High phenotypic values Machine learning Genotypic marker 



Deep convolutional neural network


Deep learning


Genomic selection


Mean normalized discounted cumulative gain value


(Ridge regression)-Best linear unbiased prediction



This work was supported by the National Natural Science Foundation of China (31570371), the Agricultural Science and Technology Innovation and Research Project of Shaanxi Province, China (2015NY011), the Youth 1000-Talent Program of China, the Hundred Talents Program of Shaanxi Province of China, the Innovative Talents Promotion Project of Shaanxi Province of China (2017KJXX-67), and the Fund of Northwest A&F University.

Compliance with ethical standards

Conflict of interest

We declare that we have no competing interests.

Supplementary material

425_2018_2976_MOESM1_ESM.pdf (895 kb)
Supplementary material 1 (PDF 896 kb)


  1. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838. CrossRefPubMedPubMedCentralGoogle Scholar
  2. Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878. CrossRefPubMedPubMedCentralGoogle Scholar
  3. Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, Tyagi A, Mushtaq M, Jain N, Singh PK, Singh GP, Prabhu KV (2016) Genomic selection in the era of next generation sequencing for complex traits in plant breeding. Front Genet 7:221. CrossRefPubMedPubMedCentralGoogle Scholar
  4. Bhering LL, Junqueira VS, Peixoto LA, Cruz CD, Laviola BG (2015) Comparison of methods used to identify superior individuals in genomic selection in plant breeding. Genet Mol Res 14(3):10888–10896. CrossRefPubMedPubMedCentralGoogle Scholar
  5. Blondel M, Onogi A, Iwata H, Ueda N (2015) A ranking approach to genomic selection. PLoS One 10(6):e0128570. CrossRefPubMedPubMedCentralGoogle Scholar
  6. Chen Y, Li Y, Narayan R, Subramanian A, Xie X (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1832–1839. CrossRefPubMedPubMedCentralGoogle Scholar
  7. Crossa J, Jarquín D, Franco J, Pérez-Rodríguez P, Burgueño J, Saint-Pierre C, Vikram P, Sansaloni C, Petroli C, Akdemir D, Sneller C, Reynolds M, Tattaris M, Payne T, Guzman C, Peña RJ, Wenzl P, Singh S (2016) Genomic prediction of gene bank wheat landraces. G3 (Bethesda) 6(7):1819–1834. CrossRefGoogle Scholar
  8. Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G, Burgueño J, Camacho-González JM, Pérez-Elizalde S, Beyene Y, Dreisigacker S, Singh R, Zhang X, Gowda M, Roorkiwal M, Rutkoski J, Varshney RK (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22(11):961–975. CrossRefGoogle Scholar
  9. de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182(1):375–385. CrossRefPubMedPubMedCentralGoogle Scholar
  10. Desta ZA, Ortiz R (2014) Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci 19(9):592–601. CrossRefGoogle Scholar
  11. Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3):250. CrossRefGoogle Scholar
  12. Gianola D, Schön CC (2016) Cross-validation without doing cross-validation in genome-enabled prediction. G3 (Bethesda) 6(10):3107–3128. CrossRefGoogle Scholar
  13. Guzman C, Peña RJ, Singh R, Autrique E, Dreisigacker S, Crossa J, Rutkoski J, Poland J, Battenfield S (2016) Wheat quality improvement at CIMMYT and the use of genomic selection on it. Appl Transl Genom 11:3–8. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 4(1):65–75. CrossRefGoogle Scholar
  15. Huang M, Cabrera A, Hoffstetter A, Griffey C, Van Sanford D, Costa J, McKendry A, Chao S, Sneller C (2016) Genomic selection for wheat traits and trait stability. Theor Appl Genet 129(9):1697–1710. CrossRefPubMedPubMedCentralGoogle Scholar
  16. Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9(2):166–177. CrossRefGoogle Scholar
  17. Jo T, Hou J, Eickholt J, Cheng J (2015) Improving protein fold recognition by deep learning networks. Sci Rep 5:17573. CrossRefPubMedPubMedCentralGoogle Scholar
  18. Jonas E, de Koning DJ (2013) Does genomic selection have a future in plant breeding? Trends Biotechnol 31(9):497–504. CrossRefGoogle Scholar
  19. Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26(7):990–999. CrossRefPubMedPubMedCentralGoogle Scholar
  20. Kennedy J, Eberhart R (1995) Particle swarm optimization. ICNN 4:1942–1948. CrossRefGoogle Scholar
  21. Kim SG, Harwani M, Grama A, Chaterji S (2016) EP-DNN: a deep neural network-based global enhancer prediction algorithm. Sci Rep 6:38433. CrossRefPubMedPubMedCentralGoogle Scholar
  22. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. CrossRefGoogle Scholar
  23. Liu F, Li H, Ren C, Bo X, Shu W (2016) PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci Rep 6:28517. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Marulanda JJ, Mi X, Melchinger AE, Xu JL, Würschum T, Longin CF (2016) Optimum breeding strategies using genomic selection for hybrid breeding in wheat, maize, rye, barley, rice and triticale. Theor Appl Genet 129(10):1901–1913. CrossRefPubMedPubMedCentralGoogle Scholar
  25. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829PubMedPubMedCentralGoogle Scholar
  26. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869. CrossRefPubMedPubMedCentralGoogle Scholar
  27. Poland J, Rutkoski J (2016) Advances and challenges in genomic selection for disease resistance. Annu Rev Phytopathol 54:79–98. CrossRefPubMedPubMedCentralGoogle Scholar
  28. Qiu Z, Cheng Q, Song J, Tang Y, Ma C (2016) Application of machine learning-based classification to genomic selection and performance improvement. In: Huang DS, Bevilacqua V, Premaratne P (eds) Intelligent computing theories and applicaton. Proceedings of the 12th international conference on intelligent computing (ICIC 2016), Lecture notes in computer science, vol 9771, pp 412–421. CrossRefGoogle Scholar
  29. Quang D, Xie X (2016) DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 44(11):e107. CrossRefPubMedPubMedCentralGoogle Scholar
  30. Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31(5):761–763. CrossRefPubMedPubMedCentralGoogle Scholar
  31. Resende MF Jr, Muñoz P, Resende MD, Garrick DJ, Fernando RL, Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M (2012) Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 190(4):1503–1510. CrossRefPubMedPubMedCentralGoogle Scholar
  32. Riedelsheimer C, Technow F, Melchinger AE (2012) Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines. BMC Genomics 13:452. CrossRefPubMedPubMedCentralGoogle Scholar
  33. Roorkiwal M, Rathore A, Das RR, Singh MK, Jain A, Srinivasan S, Gaur PM, Chellapilla B, Tripathi S, Li Y, Hickey JM, Lorenz A, Sutton T, Crossa J, Jannink JL, Varshney RK (2016) Genome-enabled prediction models for yield related traits in chickpea. Front Plant Sci 7:1666. CrossRefPubMedPubMedCentralGoogle Scholar
  34. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536. CrossRefGoogle Scholar
  35. Schmidt M, Kollers S, Maasberg-Prelle A, Großer J, Schinkel B, Tomerius A, Graner A, Korzun V (2016) Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theor Appl Genet 129(2):203–213. CrossRefGoogle Scholar
  36. Singh R, Lanchantin J, Robins G, Qi Y (2016) DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32(17):i639–i648. CrossRefPubMedPubMedCentralGoogle Scholar
  37. Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E, Atlin G, Jannink JL, McCouch SR (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11(2):e1004982. CrossRefPubMedPubMedCentralGoogle Scholar
  38. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. JMLR 15:1929–1958Google Scholar
  39. van Eeuwijk FA, Bink MC, Chenu K, Chapman SC (2010) Detection and use of QTL for complex traits in multiple environments. Curr Opin Plant Biol 13(2):193–205. CrossRefPubMedPubMedCentralGoogle Scholar
  40. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423. CrossRefPubMedCentralGoogle Scholar
  41. Varshney RK (2016) Exciting journey of 10 years from genomes to fields and markets: some success stories of genomics-assisted breeding in chickpea, pigeonpea and groundnut. Plant Sci 242:98–107. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Wang S, Peng J, Ma J, Xu J (2016) Protein secondary structure prediction using deep convolutional neural fields. Sci Rep 6:18962. CrossRefPubMedPubMedCentralGoogle Scholar
  43. Whittaker JC, Thompson R, Denham MC (2000) Marker-assisted selection using ridge regression. Genet Res 75(2):249–252. CrossRefGoogle Scholar
  44. Wimmer V, Lehermeier C, Albrecht T, Auinger HJ, Wang Y, Schön CC (2013) Genome-wide prediction of traits with different genetic architecture through efficient variable selection. Genetics 195(2):573–587. CrossRefPubMedPubMedCentralGoogle Scholar
  45. Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RK, Hua Y, Gueroussov S, Najafabadi HS, Hughes TR, Morris Q, Barash Y, Krainer AR, Jojic N, Scherer SW, Blencowe BJ, Frey BJ (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806. CrossRefPubMedPubMedCentralGoogle Scholar
  46. Xu Y, Crouch JH (2008) Marker-assisted selection in plant breeding: from publications to practice. Crop Sci 48(2):391. CrossRefGoogle Scholar
  47. Yu X, Li X, Guo T, Zhu C, Wu Y, Mitchell SE, Roozeboom KL, Wang D, Wang ML, Pederson GA, Tesso TT, Schnable PS, Bernardo R, Yu J (2016) Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat Plants 2:16150. CrossRefPubMedPubMedCentralGoogle Scholar
  48. Zeng H, Edwards MD, Ge L, Gifford DK, Zeng H, Edwards MD, Ge L, Gifford DK (2016) Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12):i121–i127. CrossRefPubMedPubMedCentralGoogle Scholar
  49. Zhang S, Zhou J, Hu H, Gong H, Chen L, Cheng C, Zeng J (2016) A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res 44(4):e32. CrossRefPubMedPubMedCentralGoogle Scholar
  50. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934. CrossRefPubMedPubMedCentralGoogle Scholar
  51. Zou C, Wang P, Xu Y (2016) Bulked sample analysis in genetics, genomics and crop improvement. Plant Biotechnol J 14(10):1941–1955. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life SciencesNorthwest A&F UniversityYanglingChina
  2. 2.Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest RegionMinistry of Agriculture, Northwest A&F UniversityYanglingChina
  3. 3.Biomass Energy Center for Arid and Semi-arid LandsNorthwest A&F UniversityShaanxiChina

Personalised recommendations