Abstract
Crop diseases are the most important biological hazards to challenge sustainable development in agricultural production for many years. Every year, 42% of the global agricultural yield is destroyed by disease. Bioinformatics techniques provide efficient methods with which to analyze and interpret the raw biological data, which helps to study the effect of a pathogen on a crop. Microarray gene expression data represent the expression levels of the genes of a cell (organism) maintained in a particular environment. Hence, significant gene prediction and pathogen–host interactions can be studied using gene expression data. Different machine learning techniques can be applied to extract useful information represented by the candidate genes. The approach proposed in this chapter consists of the preprocessing of gene expression data, gene selection or feature extraction using a parallel approach and classification. The feature selection methods have been analyzed for the extraction of candidate genes with biological significance for rice-related diseases; these are a support vector machine with recursive feature elimination (SVM-RFE), minimum redundancy maximum relevance (mRMR), principal component analysis (PCA), successive feature selection (SFS) and independent component analysis (ICA). In order to deal with computational complexity and the large volume of data, the combination of general-purpose graphics processing unit (GPGPU) computing and MapReduce programming on an Apache Hadoop framework is proposed. The experimental results show improved time efficiency in feature extraction and classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Sørlie T, Tibshirani R, Parker J et al (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci 100(14):8418–8423
Van’t Veer LJ, Dai H, Van de Vijver MJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Boersma BJ, Reimer M, Yi M et al (2008) A stromal gene signature associated with inflammatory breast cancer. Int J Cancer 122(6):1324–1332
Mishra D, Dash R, Rath AK, Acharya M (2011) Feature selection in gene expression data using principal component analysis and rough set theory. Adv Exp Med Biol 696(1):91–100
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene section for cancer classification using support vector machines. Mach Learn 46(1):389–422
Tang Y, Zhang YQ, Huang Z (2007) Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans Comput Biol BioInf :365–381
Au W-H, Chan KCC et al (2005) Attribute clustering for grouping, selection and classification of gene expression data. IEEE/ACM Trans Comput Biol BioInf 2(2):83–101
Zheng CH, Ng TY, Zhang L, Shiu CK, Wang HQ (2011) Tumor classification based on non-negative matrix factorization using gene expression data. IEEE Trans Nanobiosci 10(2):86–93
Chuang LY; Yang CH, Tu CJ, Yang CH (2006) A novel feature selection for gene expression data. In: Proceedings of the joint conference on information sciences. Atlantis Press, pp 57–60
Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF (2012) Biomarker identification and cancer classification based on microarray data using laplace naive bayes model with mean shrinkage. IEEE/ACM Trans Comput Biol Bioinf 9(6):1649–1661
Aggarwal CC (2014) Data classification: algorithms and applications, 1st edn. CRC Press, Boca Raton, pp 2–4
Rojas R (1996) Neural Networks—a systematic introduction, 1st edn. Springer, New York, pp 55–58
Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28(4):243–268
Pirooznia M, Yang JY, Yang MQ, Deng Y (2008) A comparative study of different machine learning methods on microarray gene expression data, BMC Genomics 9(1):230–230
Dudoit S, Fridlyand J, Speed TP (2002) J Am Stat Assoc 97(457):77–87
Mallika R, Saravanan V (2010) An SVM based classification method for cancer data using minimum microarray gene expressions. Int Sci Index 4(2):472–476
Shen X, Lin Y (2004) Gene expression data classification using SVM-KNN classifier. In: International symposium on intelligent multimedia, video and speech processing, pp 149–152
Samb ML, Camara F, Ndiaye S, Slimani Y, Esseghir MA (2012) Int J Adv Sci Technol 43(1):27–36
Zhou X, Tuck DP (2007) MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics :1106–1114
Ren Y, Wang D, Wang Y, Zhou J, Zhang H et al (2010) Prediction of disease-resistant gene in rice based on SVM-RFE. In: 3rd international conference on biomedical engineering and informatics (BMEI), vol 6, no 1, pp 2343–2346
Shaik Rafi, Ramakrishna W (2014) Machine learning approaches distinguish multiple stress conditions using stress-responsive genes and identify candidate genes for broad resistance in rice. Plant Physiol 164(1):481–495
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on opearting systems design & implementation, vol 6, issue no 1. Usenix, CA, USA, pp 137–149
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST), pp 121–134
Wu GQ, Li HG, Hu XG, Bi Y, Zhang J et al (2009) MReC4.5: C4.5 Ensemble classification with MapReduce. In: China grid annual conference, pp 249–255
Athanasopoulos A, Dimou A, Mezaris V, Kompatsiaris I (2011) GPU acceleration for support vector machines. In: Proceedings of the 12th international workshop on image analysis for multimedia interactive services
Zhang X, Zhang Y (2014) GPU implementation of parallel support vector machine algorithm with applications to intruder detection. J Comput 9(5)
Azmandian F et al (2014) Harnessing the power of GPUs to speed up feature selection for outlier detection. J Comput Sci Technol 29(3):408–422
Sharma A, Imoto S, Miyano S (2012) A top-r feature selection algorithm for micro array gene expression data. IEEE/ACM Trans Comput Biol Bioinf 9(3):754–764
Zhou L, Wang H, Wang W (2012) Parallel implementation of classification algorithms based on cloud computing environment. TELKOMNIKA Indonesian J Electr Eng 10(5):1087–1092
Mcnabb AW, Monson, CK, Seppi KD (2007) Parallel PSO using mapreduce. IEEE Congress on Evolutionary Computation, pp 7–14
Catanzaro BC, Sundaram N, Keutzer K (2008) Fast support vector machine training and classification on graphics processors. In: Proceedings of the 25th international conference on machine learning, pp 104–111
Mejia-Roa E, Garcia C, Gomez et al (2011) Biclustering and classification analysis in gene expression using nonnegative matrix factorization on multi-GPU systems. In: 11th international conference on intelligent systems design and applications (ISDA), pp 882–887
Dey N, Ashour A (2016) Classification and clustering in biomedical signal processing. IGI Publishing, Hershey, PA
AlShahrani AM, Al-Abadi MA et al (2017) Automated system for crops recognition and classification. In Applied video processing in surveillance and monitoring systems, doi:10.4018/978-1-5225-1022-2.ch00
Kriti, Virmani J, Dey N, Kumar V (2015) Applications of intelligent optimization in biology and medicine. In: PCA-PNN and PCA-SVM based CAD systems for breast density classification, vol 96, pp 159–180
Saba L, Dey N, Ashour AS, Samanta S (2016) Automated stratification of liver disease in ultrasound: an online accurate feature classification paradigm
Ahmed Saddam, Dey Nilanjan, Ashour Amira S et al (2017) Effect of fuzzy partitioning in Crohn’s disease classification: a neuro-fuzzy-based approach. Med Biol Eng Comput 55(1):101–115
Chatterjee S, Hore S, Dey N (2015) Dengue fever classification using gene expression data: a PSO based artificial neural network approach. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications, pp 331–341
Zemmal N, Azizi N, Sellami M, Dey N (2015) Automated classification of mammographic abnormalities using transductive semi supervised learning algorithm. In: Proceedings of the Mediterranean conference on information and communication technologies, pp 657–662 (2015)
Jain A, Bhatnagar V, Dey N (2016) Dynamic priceaAssessment Model for Flight Booking Engines using Classification and Regression Adapted to MapReduce Framework. J Global Inf Manage
Acknowledgements
This research is an outcome of University Grants Commission project. The work was carried out in PSG-Nokia Centre for Big Data Analytics. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Sadasivam, G.S., Madhesu, S., Mumthas, O.Y., Dharani, K. (2018). Crop Disease Protection Using Parallel Machine Learning Approaches. In: Dey, N., Ashour, A., Borra, S. (eds) Classification in BioApps. Lecture Notes in Computational Vision and Biomechanics, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-65981-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-65981-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65980-0
Online ISBN: 978-3-319-65981-7
eBook Packages: EngineeringEngineering (R0)