Advertisement

Predicting Rice Phenotypes with Meta-learning

  • Oghenejokpeme I. OrhoborEmail author
  • Nickolai N. Alexandrov
  • Ross D. King
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11198)

Abstract

The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. We evaluated the frameworks on a genomic rice dataset where the regression task is to predict plant phenotype. We conclude that there are use cases for both frameworks.

Keywords

Rice Bioinformatics Machine learning Meta-learning 

References

  1. 1.
    Alexandrov, N., et al.: SNP-seek database of SNPs derived from 3000 rice genomes. Nucl. Acids Res. 43(D1), D1023–D1027 (2015)CrossRefGoogle Scholar
  2. 2.
    Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)MathSciNetGoogle Scholar
  3. 3.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)Google Scholar
  4. 4.
    Breheny, P., Huang, J.: Penalized methods for bi-level variable selection. Stat. Interface 2(3), 369 (2009)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  7. 7.
    Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 18. ACM (2004)Google Scholar
  8. 8.
    Chen, T., He, T.: xgboost: extreme gradient boosting. R package version 0.4-2 (2015)Google Scholar
  9. 9.
    Cortes, C., Mohri, M., Rostamizadeh, A.: Learning non-linear combinations of kernels. In: Advances in Neural Information Processing Systems, pp. 396–404 (2009)Google Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  11. 11.
    Džeroski, S., Ženko, B.: Stacking with multi-response model trees. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 201–211. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45428-4_20CrossRefGoogle Scholar
  12. 12.
    Džeroski, S., Ženko, B.: Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 54(3), 255–273 (2004)CrossRefGoogle Scholar
  13. 13.
    Endelman, J.B.: Ridge regression and other kernels for genomic selection with r package rrBLUP. Plant Genome 4(3), 250–255 (2011)CrossRefGoogle Scholar
  14. 14.
    Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
  15. 15.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)Google Scholar
  16. 16.
    Grenier, C., et al.: Accuracy of genomic selection in a rice synthetic population developed for recurrent selection breeding. PloS ONE 10(8), e0136594 (2015)CrossRefGoogle Scholar
  17. 17.
    Grinberg, N.F., et al.: Implementation of genomic prediction in Lolium perenne (L.) breeding populations. Front. Plant Sci. 7, 133 (2016)Google Scholar
  18. 18.
    Hainmueller, J., Hazlett, C.: Kernel regularized least squares: Reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)Google Scholar
  19. 19.
    Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339–355 (2009)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)Google Scholar
  21. 21.
    Jahrer, M., Töscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 693–702. ACM (2010)Google Scholar
  22. 22.
    Jolliffe, I.T.: A note on the use of principal components in regression. Appl. Stat. 31(3) 300–303 (1982)CrossRefGoogle Scholar
  23. 23.
    Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comput. 132(1), 1–63 (1997)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6(1), 1544–6115 (2007)Google Scholar
  25. 25.
    Lanckriet, G.R., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5(Jan), 27–72 (2004)Google Scholar
  26. 26.
    Maclean, J., Hardy, B., Hettel, G.: Rice almanac: source book for one of the most important economic activities on earth. In: IRRI (2013)Google Scholar
  27. 27.
    Mendes-Moreira, J., Soares, C., Jorge, A.M., Sousa, J.F.D.: Ensemble approaches for regression: a survey. ACM Comput. Surv. (CSUR) 45(1), 10 (2012)CrossRefGoogle Scholar
  28. 28.
    Merz, C.J.: Classification and regression by combining models. Ph.D. thesis, University of California Irvine (1998)Google Scholar
  29. 29.
    Ni, W., Brown, S.D., Man, R.: Stacked partial least squares regression analysis for spectral calibration and prediction. J. Chemom. 23(10), 505–517 (2009)CrossRefGoogle Scholar
  30. 30.
    Ogutu, J.O., Piepho, H.P.: Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. In: BMC Proceedings. vol. 8, p. S7. BioMed Central (2014)CrossRefGoogle Scholar
  31. 31.
    Onogi, A., et al..: Exploring the areas of applicability of whole-genome prediction methods for asian rice (oryza sativa l.). Theor. Appl. Genet. 128(1), 41–53 (2015)CrossRefGoogle Scholar
  32. 32.
    Parmanto, B., Munro, P.W., Doyle, H.R.: Reducing variance of committee prediction with resampling techniques. Connect. Sci. 8(3–4), 405–426 (1996)CrossRefGoogle Scholar
  33. 33.
    Purcell, S., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)CrossRefGoogle Scholar
  34. 34.
    Ray, D.K., Mueller, N.D., West, P.C., Foley, J.A.: Yield trends are insufficient to double global crop production by 2050 (2013)CrossRefGoogle Scholar
  35. 35.
    Rooney, N., Patterson, D., Anand, S., Tsymbal, A.: Dynamic integration of regression models. In: Roli, F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 164–173. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-25966-4_16CrossRefGoogle Scholar
  36. 36.
    Rutkoski, J.E., Poland, J., Jannink, J., Sorrells, M.E.: Imputation of unordered markers and the impact on genomic selection accuracy. G3: Genes Genomes Genet. 3(3), 427–439 (2013)CrossRefGoogle Scholar
  37. 37.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7(Jul), 1531–1565 (2006)Google Scholar
  38. 38.
    Spindel, J., et al.: Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet. 11(2), e1004982 (2015)CrossRefGoogle Scholar
  39. 39.
    Tai, A.P., Martin, M.V., Heald, C.L.: Threat to future global food security from climate change and ozone air pollution. Nat. Clim. Chang. 4(9), 817–821 (2014)CrossRefGoogle Scholar
  40. 40.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Society. Ser. B (Methodol.) 58(1) 267–288 (1996)Google Scholar
  41. 41.
    Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. (JAIR) 10, 271–289 (1999)CrossRefGoogle Scholar
  42. 42.
    Un, U.N.: World population prospects: the 2015 revision, key findings and advance tables. Working Paper, No. ESA/P/WP. 241. (2015)Google Scholar
  43. 43.
    Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013)
  44. 44.
    Xu, L., Jiang, J.H., Zhou, Y.P., Wu, H.L., Shen, G.L., Yu, R.Q.: MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemom. Intell. Lab. Syst. 87(2), 226–230 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.The University of ManchesterManchesterUnited Kingdom
  2. 2.The International Rice Research InstituteLos BañosPhilippines

Personalised recommendations