Abstract
Genomic selection (GS) is a novel breeding strategy that selects individuals with high breeding value using computer programs. Although GS has long been practiced in the field of animal breeding, its application is still challenging in crops with high breeding efficiency, due to the limited training population size, the nature of genotype-environment interactions, and the complex interaction patterns between molecular markers. In this study, we developed a bioinformatics pipeline to perform machine learning (ML)-based classification for GS. We built a random forest-based ML classifier to produce an improved prediction performance, compared with four widely used GS prediction models on the maize GS dataset under study. We found that a reasonable ratio between positive and negative samples of training dataset is required in the ML-based GS classification system. Moreover, we recommended more careful selection of informative SNPs to build a ML-based GS model with high prediction performance.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Meuwissen, T.H., Hayes, B.J., Goddard, M.E.: Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001)
Desta, Z.A., Ortiz, R.: Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci. 19, 592–601 (2014)
Hayes, B.J., Bowman, P.J., Chamberlain, A.J., Goddard, M.E.: Invited review: genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009)
Wellmann, R., Preuss, S., Tholen, E., Heinkel, J., Wimmers, K., Bennewitz, J.: Genomic selection using low density marker panels with application to a sire line in pigs. Genet. Sel. Evol. 45, 28 (2013)
Wolc, A., Zhao, H.H., Arango, J., Settar, P., Fulton, J.E., O’Sullivan, N.P., Preisinger, R., Stricker, C., Habier, D., Fernando, R.L., Garrick, D.J., Lamont, S.J., Dekkers, J.C.: Response and inbreeding from a genomic selection experiment in layer chickens. Genet. Sel. Evol. 47, 59 (2015)
Isidro, J., Jannink, J.L., Akdemir, D., Poland, J., Heslot, N., Sorrells, M.E.: Training set optimization under population structure in genomic selection. Theoret. Appl. Genet. 128, 145–158 (2015)
Crossa, J., Perez, P., Hickey, J., Burgueno, J., Ornella, L., Ceron-Rojas, J., Zhang, X., Dreisigacker, S., Babu, R., Li, Y., Bonnett, D., Mathews, K.: Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity 112, 48–60 (2014)
Brito, F.V., Neto, J.B., Sargolzaei, M., Cobuci, J.A., Schenkel, F.S.: Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genet. 12, 80 (2011)
Habier, D., Fernando, R.L., Kizilkaya, K., Garrick, D.J.: Extension of the Bayesian alphabet for genomic selection. BMC Bioinform. 12, 186 (2011)
Endelman, J.B.: Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011)
de Los Campos, G., Hickey, J.M., Pong-Wong, R., Daetwyler, H.D., Calus, M.P.: Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–345 (2013)
Blondel, M., Onogi, A., Iwata, H., Ueda, N.: A ranking approach to genomic selection. PLoS ONE 10, 0128570 (2015)
Ornella, L., Perez, P., Tapia, E., Gonzalez-Camacho, J.M., Burgueno, J., Zhang, X., Singh, S., Vicente, F.S., Bonnett, D., Dreisigacker, S., Singh, R., Long, N., Crossa, J.: Genomic-enabled prediction with classification algorithms. Heredity 112, 616–626 (2014)
Gonzalez-Camacho, J.M., Crossa, J., Perez-Rodriguez, P., Ornella, L., Gianola, D.: Genome-enabled prediction using probabilistic neural network classifiers. BMC Genom. 17, 208 (2016)
Chen, X., Ishwaran, H.: Random forests for genomic data analysis. Genomics 99, 323–329 (2012)
Sturm, M., Hackenberg, M., Langenberger, D., Frishman, D.: TargetSpy: a supervised machine learning approach for MicroRNA target prediction. BMC Bioinform. 11, 292 (2010)
Cui, H., Zhai, J., Ma, C.: MiRLocator: machine learning-based prediction of mature MicroRNAs within plant pre-miRNA sequences. PLoS ONE 10, e0142753 (2015)
Hamp, T., Rost, B.: More challenges for machine-learning protein interactions. Bioinformatics 31, 1521–1525 (2015)
Shaik, R., Ramakrishna, W.: Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice. Plant Physiol. 164, 481–595 (2014)
Ma, C., Xin, M., Feldmann, K.A., Wang, X.: Machine learning-based differential network analysis: a study of stress-responsive transcriptomes in arabidopsis. Plant Cell 26, 520–537 (2014)
Hickey, J.M., Dreisigacker, S., Crossa, J., Hearne, S., Babu, R., Prasanna, B.M., Grondona, M., Zambelli, A., Windhausen, V.S., Mathews, K., Gorjanc, G.: Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci. 54, 1476–1488 (2014)
Bermingham, M.L., Pong-Wong, R., Spiliopoulou, A., Hayward, C., Rudan, I., Campbell, H., Wright, A.F., Wilson, J.F., Agakov, F., Navarro, P., Haley, C.S.: Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci. Rep. 5, 10312 (2015)
Long, N., Gianola, D., Rosa, G.J.M., Weigel, K.A., Avendano, S.: Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. J. Anim. Breed. Genet. 124, 377–389 (2007)
Adorjan, P., Distler, J., Lipscher, E., Model, F., Muller, J., Pelet, C., Braun, A., Florl, A.R., Gutig, D., Grabs, G., Howe, A., Kursar, M., Lesche, R., Leu, E., Lewin, A., Maier, S., Muller, V., Otto, T., Scholz, C., Schulz, W.A., Seifert, H.H., Schwope, I., Ziebarth, H., Berlin, K., Piepenbrock, C., Olek, A.: Tumour class prediction and discovery by microarray-based DNA methylation analysis. Nucleic Acids Res. 30, e21 (2002)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Lloyd, J.P., Seddon, A.E., Moghe, G.D., Simenc, M.C., Shiu, S.H.: Characteristics of plant essential genes allow for within- and between-species prediction of lethal mutant phenotypes. Plant Cell 27, 2133–2147 (2015)
Panwar, B., Arora, A., Raghava, G.P.: Prediction and classification of NcRNAs using structural information. BMC Genom. 15, 127 (2014)
Touw, W.G., Bayjanov, J.R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., van Hijum, S.A.: data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform. 14, 315–326 (2013)
Acknowledgement
This work was supported by the grants of the National Natural Science Foundation of China (No. 31570371), Agricultural Science and Technology Innovation and Research Project of Shaanxi Province, China (No. 2015NY011) and the Fund of Northwest A&F University (No. Z111021403 and Z109021514).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Qiu, Z., Cheng, Q., Song, J., Tang, Y., Ma, C. (2016). Application of Machine Learning-Based Classification to Genomic Selection and Performance Improvement. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-42291-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42290-9
Online ISBN: 978-3-319-42291-6
eBook Packages: Computer ScienceComputer Science (R0)