Abstract
Breast cancer is one of the major type of cancer which is the leading cause of death in women. The research work is carried out on the real data of patient records obtained from HealthCare Global Enterprises Ltd (HCG) hospitals. The work analyzes the four major class variables in the dataset, namely death, progression, recurrence and metastasis. The influence of the same 11 predictor variables is explored for each of the class. Various machine algorithms namely Support Vector Machine, Decision Tree, Multi-layer Perceptron and Naive Bayes have been explored for classification of the patient data into various classes. The imbalance in the data is handled using an over sampling technique. The contribution of various attributes in classifying the instances into different classes is also being explored. The model helps in predicting various factors and thus helps in early diagnosis in the breast cancer.
References
Jothi, N., Wahidah, H.: Data mining in healthcare – a review. Proc. Comput. Sci. 72, 306–313 (2015)
WHO Cancer - World Health Organization. http://www.who.int/mediacentre/factsheets/fs297/en
Cancer Statistics for the UK. http://www.cancerresearchuk.org
Khare, S., Gupta, D.: Association rule analysis in cardiovascular disease. In: Second International Conference on Cognitive Computing and Information Processing (CCIP), SJCE, Mysuru, India, pp. 1–6. IEEE (2016)
Fan, Q., et al.: An application of apriori algorithm in SEER breast cancer data. In: 2010 International Conference on Artificial Intelligence and Computational Intelligence (AICI), vol. 3, pp. 114–116. IEEE (2010)
Gupta, D., Aggarwal, A., Khare, S.: A method to predict diagnostic codes for chronic diseases using machine learning techniques. In: Fifth IEEE International Conference on Computing Communication and Automation (ICCA), pp. 281–287 (2016)
Dominic, V., Aggarwal, A., Gupta, D., Khare, S.: Investigation of chronic disease correlation using data mining techniques. In: 2nd International Conference on Recent Advances in Engineering and Computational Sciences (RAECS), pp. 1–6. University Institute of Engineering and Technology, Panjab University, Chandigarh (2015)
Dominic, V., Gupta, D., Khare, S.: Exploration of machine learning techniques for cardiovascular disease. Appl. Med. Inf. Index Scopus 36(1), 23–32 (2015)
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. In: International Conférence Science Direct, pp. 8–17 (2014)
Sharma, N., Om, H.: Data mining models for predicting oral cancer survivability. Netw. Model. Anal. Health Inf. Bioinform. 2(4), 285–295 (2013)
Yang, H., Chen, Y.P.P.: Data mining in lung cancer pathologic staging diagnosis: correlation between clinical and pathology information. Expert Syst. Appl. 42(15), 6168–6176 (2015)
Abreu, P.H., et al.: Predicting breast cancer recurrence using machine learning techniques: a systematic review. ACM Comput. Surv. (CSUR) 49(3), 52 (2016)
Kim, W., et al.: Development of novel breast cancer recurrence prediction model using support vector machine. J. Breast Cancer 15(2), 230–238 (2012)
Ahmad, L.G., Eshlaghy, A.T., Poorebrahimi, A., Ebrahimi, M., Razavi, A.R.: Using three machine learning techniques for predicting breast cancer recurrence. J. Health Med. Inf. 4(124), 3 (2013)
Park, K., et al.: Robust predictive model for evaluating breast cancer survivability. Eng. Appl. Artif. Intell. 26(9), 2194–2205 (2013)
Sain, H., Purnami, S.W.: Combine sampling support vector machine for imbalanced data classification. Procedia Comput. Sci. 72, 59–66 (2015)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Roozbahani, Z., Katanforoush, A.: Classification of gene expression data using multiple ranker evaluators and neural network. In: CICIS, pp. 29–31 (2012)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Platt, J.C.: 12 fast training of support vector machines using sequential minimal optimization. Adv. Kernel Methods 1, 185–208 (1999)
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Shastri, S.S., Nair, P.C., Gupta, D., Nayar, R.C., Rao, R., Ram, A. (2018). Breast Cancer Diagnosis and Prognosis Using Machine Learning Techniques. In: Thampi, S., Mitra, S., Mukhopadhyay, J., Li, KC., James, A., Berretti, S. (eds) Intelligent Systems Technologies and Applications. ISTA 2017. Advances in Intelligent Systems and Computing, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-319-68385-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-68385-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68384-3
Online ISBN: 978-3-319-68385-0
eBook Packages: EngineeringEngineering (R0)