Skip to main content

A Guideline for Building Large Coffee Rust Samples Applying Machine Learning Methods

  • Conference paper
  • First Online:
Advances in Information and Communication Technologies for Adapting Agriculture to Climate Change (AACC'17 2017)

Abstract

Coffee rust has become a serious concern for many coffee farmers and manufacturers. The American Phytopathological Society discusses its importance saying this: “the most economically important coffee disease in the world,” while “in monetary value, coffee is the most important agricultural product in international trade”. The early detection has inspired researchers to apply supervised learning algorithms on predicting the disease appearance. However, the main drawback of the related works is the few data samples of the dependent variable: Incidence Rate of Rust, since the datasets do not have a reliable representation of the disease, which will generate inaccurate classifiers. This paper provides a guide to increase coffee rust samples applying machine learning methods through a systematic review about coffee rust in order to select appropriate algorithms to increase rust samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arneson, P.A.: Coffee rust. Plant Health Instr. (2000)

    Google Scholar 

  2. Avelino, J., et al.: The coffee rust crises in Colombia and Central America (2008–2013): impacts, plausible causes and proposed solutions. Food Secur. 7(2), 303–321 (2015)

    Article  Google Scholar 

  3. A Solution to the Coffee Rust Epidemic: How Spectrophotometry May Provide the Answers. HunterLab Horizons Blog, 12 January 2015

    Google Scholar 

  4. Corrales, D.C., Corrales, J.C., Figueroa-Casas, A.: Towards detecting crop diseases and pest by supervised learning. Ing. Univ. 19(1), 207–228 (2015)

    Google Scholar 

  5. Corrales, D.C., Figueroa, A., Ledezma, A., Corrales, J.C.: An empirical multi-classifier for coffee rust detection in colombian crops. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) Computational Science and Its Applications, ICCSA 2015, pp. 60–74. Springer, Heidelberg (2015)

    Google Scholar 

  6. Cintra, M.E., Meira, C.A.A., Monard, M.C., Camargo, H.A., Rodrigues, L.H.A.: The use of fuzzy decision trees for coffee rust warning in Brazilian crops. In: 2011 11th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 1347–1352 (2011)

    Google Scholar 

  7. Cesare di Girolamo, L.H.R.: Potencial de técnicas de mineração de dados para modelos de alerta da ferrugem do cafeeiro (2013)

    Google Scholar 

  8. Thamada, T.T., Rodrigues, L.H.A., Meira, C.A.A.: Predição da taxa de progresso da ferrugem do cafeeiro por meio de ensembles. Predicting infection rate of coffee rust by ensembles (2015)

    Google Scholar 

  9. Rivillas Osorio, C.A.: La roya del cafeto en Colombia, impacto, manejo y costos de control. Cenicafé: Chinchiná - Caldas - Colombia (2011)

    Google Scholar 

  10. Nutman, F.J., Roberts, F.M., Clarke, R.T.: Studies on the biology of Hemileia vastatrix Berk. & Br. Trans. Br. Mycol. Soc. 46(1), 27–44 (1963)

    Article  Google Scholar 

  11. Garcia, A.L.A.: RESUMO METODOLÓGICO DE AVALIAÇÃO DAS VARIÁVEIS FENOLÓGICAS E FITOSSÂNITÁRIAS DO SISTEMA DE AVISOS FITOSSÂNITÁRIOS DO MAPA/PROCAFÉ, Varginha, Brasil (2011)

    Google Scholar 

  12. Ng, A.: CS 229 machine learning course materials. In: Supervised learning. University of Stanford (2003)

    Google Scholar 

  13. Corrales, D.C., Ledezma, A., Andrés, J.P.Q., Hoyos, J., Figueroa, A., Corrales, J.C.: A new dataset for coffee rust detection in Colombian crops base on classifiers. Sist. Telemática 12(29), 9–23 (2014)

    Article  Google Scholar 

  14. Corrales, D.C., Casas, A.F., Ledezma, A., Corrales, J.C.: Two-level classifier ensembles for coffee rust estimation in colombian crops. Int. J. Agric. Environ. Inf. Syst. 7, 41–59

    Google Scholar 

  15. Corrales, D.C., Peña, A.J.: Early warning system for coffee rust disease based on error correcting output codes: a proposal. Rev. Ing. Univ. Medellín 13(25), 59–64 (2014)

    Article  Google Scholar 

  16. Lasso, E., Thamada, T.T., Meira, C.A.A., Corrales, J.C.: Graph patterns as representation of rules extracted from decision trees for coffee rust detection. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) Metadata and Semantics Research, pp. 405–414. Springer, Heidelberg (2015)

    Google Scholar 

  17. Meira, C.A.A., Rodrigues, L.H.A., Moraes, S.A.: Análise da epidemia da ferrugem do cafeeiro com árvore de decisão. Trop. Plant Pathol. 33(2), 114–124 (2008)

    Article  Google Scholar 

  18. Pérez-Ariza, C.B., Nicholson, A.E., Flores, M.J.: Prediction of coffee rust disease using Bayesian networks. In: Andrés Cano, M.G.-O., Nielsen, T.D. (eds.) The Sixth European Workshop on Probabilistic Graphical Models. University of Granada, Granada, Spain (2012)

    Google Scholar 

  19. Cesare di Girolamo, L.H.R.: Desenvolvimento e seleção de modelos de alerta para a ferrugem do cafeeiro em anos de alta carga pendente de frutos (2013)

    Google Scholar 

  20. Meira, C.A.A., Rodrigues, L.H.A., de Moraes, S.A.: Warning models for coffee rust control in growing areas with large fruit load. Pesqui. Agropecuária Bras. 44(3), 233–242 (2009)

    Article  Google Scholar 

  21. di Girolamo Neto, C., Rodrigues, L.H.A., Meira, C.A.A.: Modelos de predição da ferrugem do cafeeiro (Hemileia vastatrix Berkeley & Broome) por técnicas de mineração de dados, 22 2014. http://www.alice.cnptia.embrapa.br/handle/doc/991078. Accessed 3 Feb 2016

  22. Luaces, O., Rodrigues, L.H.A., Alves Meira, C.A., Bahamonde, A.: Using nondeterministic learners to alert on coffee rust disease. Expert Syst. Appl. 38(11), 14276–14283 (2011)

    Google Scholar 

  23. Luaces, O., Rodrigues, L.H.A., Meira, C.A.A., Quevedo, J.R., Bahamonde, A.: Viability of an alarm predictor for coffee rust disease using interval regression. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) Trends in Applied Intelligent Systems, pp. 337–346. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Bhavsar, H., Ganatra, A.: A Comparative Study of Training Algorithms for Supervised Machine Learning

    Google Scholar 

  25. “Supervised Machine Learning: A Review of Classification …,” 11:38:43 UTC

    Google Scholar 

  26. Segrera Francia, S., Moreno García, M.N.: Multiclasificadores: métodos y arquitecturas, March 2006. http://gredos.usal.es/jspui/handle/10366/21727. Accessed 29 Dec2015

  27. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New york (2005)

    Chapter  Google Scholar 

  28. He, H., Ma, Y.: Foundations of imbalanced learning. In: Imbalanced Learning: Foundations, Algorithms, and Applications, p. 216. Wiley-IEEE Press (2013)

    Google Scholar 

  29. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  30. Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recognit. Lett. 34(12), 1339–1347 (2013)

    Article  Google Scholar 

  31. Wong, G.Y., Leung, F.H.F., Ling, S.H.: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets. In: 39th Annual Conference of the IEEE Industrial Electronics Society, IECON 2013, pp. 2354–2359 (2013)

    Google Scholar 

  32. He, G., Han, H., Wang, W.: An over-sampling expert system for learning from imbalanced data sets. In: 2005 International Conference on Neural Networks and Brain, ICNN B 2005, vol. 1, pp. 537–541 (2005)

    Google Scholar 

  33. Pengfei, J., Chunkai, Z., Zhenyu, H.: A new sampling approach for classification of imbalanced data sets with high density. In: 2014 International Conference on Big Data and Smart Computing (BIGCOMP), pp. 217–222 (2014)

    Google Scholar 

  34. Mahmoudi, S., Moradi, P., Akhlaghian, F., Moradi, R.: Diversity and separable metrics in over-sampling technique for imbalanced data classification. In: 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 152–158 (2014)

    Google Scholar 

  35. Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)

    Article  Google Scholar 

  36. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2012)

    Article  Google Scholar 

  37. Zhang, H., Li, M.: RWO-Sampling: a random walk over-sampling approach to imbalanced data classification. Inf. Fusion 20, 99–116 (2014)

    Article  Google Scholar 

  38. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) Advances in Intelligent Computing, pp. 878–887. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  39. Kerdprasop, N., Kerdprasop, K.: Predicting rare classes of primary tumors with over-sampling techniques. In: Kim, T., Adeli, H., Cuzzocrea, A., Arslan, T., Zhang, Y., Ma, J., Chung, K., Mariyam, S., Canción, X. (eds.) Database Theory and Application, Bio-science and Bio-technology, pp. 151–160. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  40. Malpica, J.A.: Splines interpolation in high resolution satellite imagery. In: Bebis, G., Boyle, R., Koracin, D., Parvin, B. (eds.) Advances in Visual Computing, pp. 562–570. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  41. Hung, K.-W., Siu, W.-C.: Learning-based image interpolation via robust k-NN searching for coherent AR parameters estimation. J. Vis. Commun. Image Represent. 31, 305–311 (2015)

    Article  Google Scholar 

  42. Rui, L., Qiong, L.: Image sharpening algorithm based on a variety of interpolation methods. In: 2012 International Conference on Image Analysis and Signal Processing (IASP), pp. 1–4 (2012)

    Google Scholar 

  43. Bentbib, A.H., El Guide, M., Jbilou, K., Reichel, L.: A global Lanczos method for image restoration. J. Comput. Appl. Math.

    Google Scholar 

  44. Shi, Z., Yao, S., Li, B., Cao, Q.: A novel image interpolation technique based on fractal theory. In: 2008 International Conference on Computer Science and Information Technology, ICCSIT 2008, pp. 472–475 (2008)

    Google Scholar 

  45. Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: 2006 Sixth International Conference on Data Mining, ICDM 2006, pp. 592–602 (2006)

    Google Scholar 

  46. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) Knowledge Discovery in Databases. PKDD 2003, pp. 107–119. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  47. Viktor, H.L., Guo, H.: Multiple classifier prediction improvements against imbalanced datasets through added synthetic examples. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) Structural, Syntactic, and Statistical Pattern Recognition, pp. 974–982. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  48. Guo, H., Viktor, H.L.: Boosting with data generation: improving the classification of hard to learn examples. In: Orchard, B., Yang, C., Ali, M. (eds.) Innovations in Applied Artificial Intelligence, pp. 1082–1091. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  49. Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011)

    Article  MathSciNet  Google Scholar 

  50. Anderson, J.W., Kennedy, K.E., Ngo, L.B., Luckow, A., Apon, A.W.: Synthetic data generation for the internet of things. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 171–176 (2014)

    Google Scholar 

  51. Albuquerque, G., Lowe, T., Magnor, M.: Synthetic generation of high-dimensional datasets. IEEE Trans. Vis. Comput. Graph. 17(12), 2317–2324 (2011)

    Article  Google Scholar 

  52. Verbiest, N., Ramentol, E., Cornelis, C., Herrera, F.: Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In: Advances in Artificial Intelligence, IBERAMIA 2012, pp. 169–178 (2012)

    Google Scholar 

  53. Törn, A.A.: Correlation coefficients of linear regression models of human decision making. Omega 8(3), 393–394 (1980)

    Article  Google Scholar 

  54. Field, A., Miles, J., Field, Z.: Discovering Statistics Using R (2012)

    Google Scholar 

  55. Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part Syst. Hum. 40(1), 185–197 (2010)

    Article  Google Scholar 

  56. Albayrak, A.S.S.: Alleviating the Class Imbalance problem in Data Mining (2013)

    Google Scholar 

  57. SMOTE: Synthetic Minority Over-sampling Technique. https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html. Accessed 19 June 2017

  58. Beretta, L., Santaniello, A.: Nearest neighbor imputation algorithms: a critical evaluation. BMC Med. Inform. Decis. Mak. 16(Suppl), 3 (2016)

    Google Scholar 

  59. Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explor. Newsl. 6(1), 30–39 (2004)

    Article  Google Scholar 

  60. Mohanty, P.K., Reza, M., Kumar, P., Kumar, P.: Implementation of cubic spline interpolation on parallel skeleton using pipeline model on CPU-GPU cluster. In: 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 747–751 (2016)

    Google Scholar 

  61. Phillips, G.M.: Univariate interpolation. In: Interpolation and Approximation by Polynomials, pp. 1–48. Springer, New York (2003)

    Google Scholar 

  62. Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)

    Google Scholar 

  63. Hamed, Y., Shafie, A., Mustaffa, Z.B., Idris, N.R.B.: An application of K-Nearest Neighbor interpolation on calibrating corrosion measurements collected by two non-destructive techniques. In: 2015 IEEE 3rd International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), pp. 1–5 (2015)

    Google Scholar 

  64. Li, H., Wan, X., Liang, Y., Gao, S.: Dynamic time warping based on cubic spline interpolation for time series data mining. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 19–26 (2014)

    Google Scholar 

  65. Multivariate - Interpolation - Approximation - Maths Reference with Worked Examples. http://www.codecogs.com/library/maths/approximation/interpolation/multivariate.php. Accessed 20 Feb 2017

  66. Influence of DEM interpolation methods in Drainage Analysis. https://www.researchgate.net/publication/237116945_Influence_of_DEM_interpolation_methods_in_Drainage_Analysis. Accessed 20 Feb 2017

  67. Yang, L., Liu, S., Tsoka, S., Papageorgiou, L.G.: A regression tree approach using mathematical programming. Expert Syst. Appl. 78, 347–357 (2017)

    Article  Google Scholar 

  68. Magnani, M.: Techniques for Dealing with Missing Data in Knowledge Discovery Tasks (2004)

    Google Scholar 

  69. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Taylor & Francis (1984)

    Google Scholar 

  70. Vapnik, V., Golowich, S.E., Smola, A.J.: Support vector method for function approximation, regression estimation and signal processing. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9, pp. 281–287. MIT Press (1997)

    Google Scholar 

  71. Neural Networks: A Comprehensive Foundation (2nd edn.) Neural Networks: A Comprehensive Foundation. ResearchGate. https://www.researchgate.net/publication/233784957_Neural_Networks_A_Comprehensive_Foundation_2nd_Edition_Neural_Networks_A_Comprehensive_Foundation. Accessed 16 June 2017

  72. Corrales, D.C., Gutierrez, G., Rodriguez, J.P., Ledezma, A., Corrales, J.C.: Lack of data: is it enough estimating the coffee rust with meteorological time series? In: Computational Science and Its Applications, ICCSA 2017, pp. 3–16 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jhonn Pablo Rodríguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodríguez, J.P., Girón, E.J., Corrales, D.C., Corrales, J.C. (2018). A Guideline for Building Large Coffee Rust Samples Applying Machine Learning Methods. In: Angelov, P., Iglesias, J., Corrales, J. (eds) Advances in Information and Communication Technologies for Adapting Agriculture to Climate Change. AACC'17 2017. Advances in Intelligent Systems and Computing, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-70187-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70187-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70186-8

  • Online ISBN: 978-3-319-70187-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics