A Guideline for Building Large Coffee Rust Samples Applying Machine Learning Methods

Rodríguez, Jhonn Pablo; Girón, Edwar Javier; Corrales, David Camilo; Corrales, Juan Carlos

doi:10.1007/978-3-319-70187-5_8

Jhonn Pablo Rodríguez¹⁷,
Edwar Javier Girón¹⁷,
David Camilo Corrales¹⁷ &
…
Juan Carlos Corrales¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 687))

Included in the following conference series:

International Conference of ICT for Adapting Agriculture to Climate Change

791 Accesses
4 Citations

Abstract

Coffee rust has become a serious concern for many coffee farmers and manufacturers. The American Phytopathological Society discusses its importance saying this: “the most economically important coffee disease in the world,” while “in monetary value, coffee is the most important agricultural product in international trade”. The early detection has inspired researchers to apply supervised learning algorithms on predicting the disease appearance. However, the main drawback of the related works is the few data samples of the dependent variable: Incidence Rate of Rust, since the datasets do not have a reliable representation of the disease, which will generate inaccurate classifiers. This paper provides a guide to increase coffee rust samples applying machine learning methods through a systematic review about coffee rust in order to select appropriate algorithms to increase rust samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arneson, P.A.: Coffee rust. Plant Health Instr. (2000)
Google Scholar
Avelino, J., et al.: The coffee rust crises in Colombia and Central America (2008–2013): impacts, plausible causes and proposed solutions. Food Secur. 7(2), 303–321 (2015)
Article Google Scholar
A Solution to the Coffee Rust Epidemic: How Spectrophotometry May Provide the Answers. HunterLab Horizons Blog, 12 January 2015
Google Scholar
Corrales, D.C., Corrales, J.C., Figueroa-Casas, A.: Towards detecting crop diseases and pest by supervised learning. Ing. Univ. 19(1), 207–228 (2015)
Google Scholar
Corrales, D.C., Figueroa, A., Ledezma, A., Corrales, J.C.: An empirical multi-classifier for coffee rust detection in colombian crops. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) Computational Science and Its Applications, ICCSA 2015, pp. 60–74. Springer, Heidelberg (2015)
Google Scholar
Cintra, M.E., Meira, C.A.A., Monard, M.C., Camargo, H.A., Rodrigues, L.H.A.: The use of fuzzy decision trees for coffee rust warning in Brazilian crops. In: 2011 11th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 1347–1352 (2011)
Google Scholar
Cesare di Girolamo, L.H.R.: Potencial de técnicas de mineração de dados para modelos de alerta da ferrugem do cafeeiro (2013)
Google Scholar
Thamada, T.T., Rodrigues, L.H.A., Meira, C.A.A.: Predição da taxa de progresso da ferrugem do cafeeiro por meio de ensembles. Predicting infection rate of coffee rust by ensembles (2015)
Google Scholar
Rivillas Osorio, C.A.: La roya del cafeto en Colombia, impacto, manejo y costos de control. Cenicafé: Chinchiná - Caldas - Colombia (2011)
Google Scholar
Nutman, F.J., Roberts, F.M., Clarke, R.T.: Studies on the biology of Hemileia vastatrix Berk. & Br. Trans. Br. Mycol. Soc. 46(1), 27–44 (1963)
Article Google Scholar
Garcia, A.L.A.: RESUMO METODOLÓGICO DE AVALIAÇÃO DAS VARIÁVEIS FENOLÓGICAS E FITOSSÂNITÁRIAS DO SISTEMA DE AVISOS FITOSSÂNITÁRIOS DO MAPA/PROCAFÉ, Varginha, Brasil (2011)
Google Scholar
Ng, A.: CS 229 machine learning course materials. In: Supervised learning. University of Stanford (2003)
Google Scholar
Corrales, D.C., Ledezma, A., Andrés, J.P.Q., Hoyos, J., Figueroa, A., Corrales, J.C.: A new dataset for coffee rust detection in Colombian crops base on classifiers. Sist. Telemática 12(29), 9–23 (2014)
Article Google Scholar
Corrales, D.C., Casas, A.F., Ledezma, A., Corrales, J.C.: Two-level classifier ensembles for coffee rust estimation in colombian crops. Int. J. Agric. Environ. Inf. Syst. 7, 41–59
Google Scholar
Corrales, D.C., Peña, A.J.: Early warning system for coffee rust disease based on error correcting output codes: a proposal. Rev. Ing. Univ. Medellín 13(25), 59–64 (2014)
Article Google Scholar
Lasso, E., Thamada, T.T., Meira, C.A.A., Corrales, J.C.: Graph patterns as representation of rules extracted from decision trees for coffee rust detection. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) Metadata and Semantics Research, pp. 405–414. Springer, Heidelberg (2015)
Google Scholar
Meira, C.A.A., Rodrigues, L.H.A., Moraes, S.A.: Análise da epidemia da ferrugem do cafeeiro com árvore de decisão. Trop. Plant Pathol. 33(2), 114–124 (2008)
Article Google Scholar
Pérez-Ariza, C.B., Nicholson, A.E., Flores, M.J.: Prediction of coffee rust disease using Bayesian networks. In: Andrés Cano, M.G.-O., Nielsen, T.D. (eds.) The Sixth European Workshop on Probabilistic Graphical Models. University of Granada, Granada, Spain (2012)
Google Scholar
Cesare di Girolamo, L.H.R.: Desenvolvimento e seleção de modelos de alerta para a ferrugem do cafeeiro em anos de alta carga pendente de frutos (2013)
Google Scholar
Meira, C.A.A., Rodrigues, L.H.A., de Moraes, S.A.: Warning models for coffee rust control in growing areas with large fruit load. Pesqui. Agropecuária Bras. 44(3), 233–242 (2009)
Article Google Scholar
di Girolamo Neto, C., Rodrigues, L.H.A., Meira, C.A.A.: Modelos de predição da ferrugem do cafeeiro (Hemileia vastatrix Berkeley & Broome) por técnicas de mineração de dados, 22 2014. http://www.alice.cnptia.embrapa.br/handle/doc/991078. Accessed 3 Feb 2016
Luaces, O., Rodrigues, L.H.A., Alves Meira, C.A., Bahamonde, A.: Using nondeterministic learners to alert on coffee rust disease. Expert Syst. Appl. 38(11), 14276–14283 (2011)
Google Scholar
Luaces, O., Rodrigues, L.H.A., Meira, C.A.A., Quevedo, J.R., Bahamonde, A.: Viability of an alarm predictor for coffee rust disease using interval regression. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) Trends in Applied Intelligent Systems, pp. 337–346. Springer, Heidelberg (2010)
Chapter Google Scholar
Bhavsar, H., Ganatra, A.: A Comparative Study of Training Algorithms for Supervised Machine Learning
Google Scholar
“Supervised Machine Learning: A Review of Classification …,” 11:38:43 UTC
Google Scholar
Segrera Francia, S., Moreno García, M.N.: Multiclasificadores: métodos y arquitecturas, March 2006. http://gredos.usal.es/jspui/handle/10366/21727. Accessed 29 Dec2015
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New york (2005)
Chapter Google Scholar
He, H., Ma, Y.: Foundations of imbalanced learning. In: Imbalanced Learning: Foundations, Algorithms, and Applications, p. 216. Wiley-IEEE Press (2013)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recognit. Lett. 34(12), 1339–1347 (2013)
Article Google Scholar
Wong, G.Y., Leung, F.H.F., Ling, S.H.: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets. In: 39th Annual Conference of the IEEE Industrial Electronics Society, IECON 2013, pp. 2354–2359 (2013)
Google Scholar
He, G., Han, H., Wang, W.: An over-sampling expert system for learning from imbalanced data sets. In: 2005 International Conference on Neural Networks and Brain, ICNN B 2005, vol. 1, pp. 537–541 (2005)
Google Scholar
Pengfei, J., Chunkai, Z., Zhenyu, H.: A new sampling approach for classification of imbalanced data sets with high density. In: 2014 International Conference on Big Data and Smart Computing (BIGCOMP), pp. 217–222 (2014)
Google Scholar
Mahmoudi, S., Moradi, P., Akhlaghian, F., Moradi, R.: Diversity and separable metrics in over-sampling technique for imbalanced data classification. In: 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 152–158 (2014)
Google Scholar
Abdi, L., Hashemi, S.: To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowl. Data Eng. 28(1), 238–251 (2016)
Article Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2012)
Article Google Scholar
Zhang, H., Li, M.: RWO-Sampling: a random walk over-sampling approach to imbalanced data classification. Inf. Fusion 20, 99–116 (2014)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) Advances in Intelligent Computing, pp. 878–887. Springer, Heidelberg (2005)
Chapter Google Scholar
Kerdprasop, N., Kerdprasop, K.: Predicting rare classes of primary tumors with over-sampling techniques. In: Kim, T., Adeli, H., Cuzzocrea, A., Arslan, T., Zhang, Y., Ma, J., Chung, K., Mariyam, S., Canción, X. (eds.) Database Theory and Application, Bio-science and Bio-technology, pp. 151–160. Springer, Heidelberg (2011)
Chapter Google Scholar
Malpica, J.A.: Splines interpolation in high resolution satellite imagery. In: Bebis, G., Boyle, R., Koracin, D., Parvin, B. (eds.) Advances in Visual Computing, pp. 562–570. Springer, Heidelberg (2005)
Chapter Google Scholar
Hung, K.-W., Siu, W.-C.: Learning-based image interpolation via robust k-NN searching for coherent AR parameters estimation. J. Vis. Commun. Image Represent. 31, 305–311 (2015)
Article Google Scholar
Rui, L., Qiong, L.: Image sharpening algorithm based on a variety of interpolation methods. In: 2012 International Conference on Image Analysis and Signal Processing (IASP), pp. 1–4 (2012)
Google Scholar
Bentbib, A.H., El Guide, M., Jbilou, K., Reichel, L.: A global Lanczos method for image restoration. J. Comput. Appl. Math.
Google Scholar
Shi, Z., Yao, S., Li, B., Cao, Q.: A novel image interpolation technique based on fractal theory. In: 2008 International Conference on Computer Science and Information Technology, ICCSIT 2008, pp. 472–475 (2008)
Google Scholar
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: 2006 Sixth International Conference on Data Mining, ICDM 2006, pp. 592–602 (2006)
Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) Knowledge Discovery in Databases. PKDD 2003, pp. 107–119. Springer, Heidelberg (2003)
Chapter Google Scholar
Viktor, H.L., Guo, H.: Multiple classifier prediction improvements against imbalanced datasets through added synthetic examples. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) Structural, Syntactic, and Statistical Pattern Recognition, pp. 974–982. Springer, Heidelberg (2004)
Chapter Google Scholar
Guo, H., Viktor, H.L.: Boosting with data generation: improving the classification of hard to learn examples. In: Orchard, B., Yang, C., Ali, M. (eds.) Innovations in Applied Artificial Intelligence, pp. 1082–1091. Springer, Heidelberg (2004)
Chapter Google Scholar
Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011)
Article MathSciNet Google Scholar
Anderson, J.W., Kennedy, K.E., Ngo, L.B., Luckow, A., Apon, A.W.: Synthetic data generation for the internet of things. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 171–176 (2014)
Google Scholar
Albuquerque, G., Lowe, T., Magnor, M.: Synthetic generation of high-dimensional datasets. IEEE Trans. Vis. Comput. Graph. 17(12), 2317–2324 (2011)
Article Google Scholar
Verbiest, N., Ramentol, E., Cornelis, C., Herrera, F.: Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In: Advances in Artificial Intelligence, IBERAMIA 2012, pp. 169–178 (2012)
Google Scholar
Törn, A.A.: Correlation coefficients of linear regression models of human decision making. Omega 8(3), 393–394 (1980)
Article Google Scholar
Field, A., Miles, J., Field, Z.: Discovering Statistics Using R (2012)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part Syst. Hum. 40(1), 185–197 (2010)
Article Google Scholar
Albayrak, A.S.S.: Alleviating the Class Imbalance problem in Data Mining (2013)
Google Scholar
SMOTE: Synthetic Minority Over-sampling Technique. https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/chawla2002.html. Accessed 19 June 2017
Beretta, L., Santaniello, A.: Nearest neighbor imputation algorithms: a critical evaluation. BMC Med. Inform. Decis. Mak. 16(Suppl), 3 (2016)
Google Scholar
Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explor. Newsl. 6(1), 30–39 (2004)
Article Google Scholar
Mohanty, P.K., Reza, M., Kumar, P., Kumar, P.: Implementation of cubic spline interpolation on parallel skeleton using pipeline model on CPU-GPU cluster. In: 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 747–751 (2016)
Google Scholar
Phillips, G.M.: Univariate interpolation. In: Interpolation and Approximation by Polynomials, pp. 1–48. Springer, New York (2003)
Google Scholar
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)
Google Scholar
Hamed, Y., Shafie, A., Mustaffa, Z.B., Idris, N.R.B.: An application of K-Nearest Neighbor interpolation on calibrating corrosion measurements collected by two non-destructive techniques. In: 2015 IEEE 3rd International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA), pp. 1–5 (2015)
Google Scholar
Li, H., Wan, X., Liang, Y., Gao, S.: Dynamic time warping based on cubic spline interpolation for time series data mining. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 19–26 (2014)
Google Scholar
Multivariate - Interpolation - Approximation - Maths Reference with Worked Examples. http://www.codecogs.com/library/maths/approximation/interpolation/multivariate.php. Accessed 20 Feb 2017
Influence of DEM interpolation methods in Drainage Analysis. https://www.researchgate.net/publication/237116945_Influence_of_DEM_interpolation_methods_in_Drainage_Analysis. Accessed 20 Feb 2017
Yang, L., Liu, S., Tsoka, S., Papageorgiou, L.G.: A regression tree approach using mathematical programming. Expert Syst. Appl. 78, 347–357 (2017)
Article Google Scholar
Magnani, M.: Techniques for Dealing with Missing Data in Knowledge Discovery Tasks (2004)
Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Taylor & Francis (1984)
Google Scholar
Vapnik, V., Golowich, S.E., Smola, A.J.: Support vector method for function approximation, regression estimation and signal processing. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9, pp. 281–287. MIT Press (1997)
Google Scholar
Neural Networks: A Comprehensive Foundation (2nd edn.) Neural Networks: A Comprehensive Foundation. ResearchGate. https://www.researchgate.net/publication/233784957_Neural_Networks_A_Comprehensive_Foundation_2nd_Edition_Neural_Networks_A_Comprehensive_Foundation. Accessed 16 June 2017
Corrales, D.C., Gutierrez, G., Rodriguez, J.P., Ledezma, A., Corrales, J.C.: Lack of data: is it enough estimating the coffee rust with meteorological time series? In: Computational Science and Its Applications, ICCSA 2017, pp. 3–16 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Telematics Engineering, Engineering Telematics Group, University of Cauca, Popayán, Colombia
Jhonn Pablo Rodríguez, Edwar Javier Girón, David Camilo Corrales & Juan Carlos Corrales

Authors

Jhonn Pablo Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Edwar Javier Girón
View author publications
You can also search for this author in PubMed Google Scholar
David Camilo Corrales
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Corrales
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jhonn Pablo Rodríguez .

Editor information

Editors and Affiliations

School of Computing and Communications, Lancaster University, Lancaster, United Kingdom
Plamen Angelov
Computer Science Department, Carlos III University of Madrid, Leganés, Madrid, Spain
Jose Antonio Iglesias
Campus de Tulcán, University of Cauca, Popayán, Colombia
Juan Carlos Corrales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, J.P., Girón, E.J., Corrales, D.C., Corrales, J.C. (2018). A Guideline for Building Large Coffee Rust Samples Applying Machine Learning Methods. In: Angelov, P., Iglesias, J., Corrales, J. (eds) Advances in Information and Communication Technologies for Adapting Agriculture to Climate Change. AACC'17 2017. Advances in Intelligent Systems and Computing, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-70187-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-70187-5_8
Published: 12 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70186-8
Online ISBN: 978-3-319-70187-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics