Classifying Breast Cancer Based on Machine Learning

  • Archana BalyanEmail author
  • Yamini Singh
  • Shashank
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1164)


Breast cancer is the most prevalent cancer among Indian women and a prime cause of death due to cancer. Hence, an early detection and accurate diagnosis and staging of breast cancer are crucial in managing the disease. In this work, a comparative study of application of machine learning classifiers has been done for the classification of benign from malignant breast cancer. This paper investigates the performance of various supervised classification techniques like logistic regression, support vector machine, k-nearest neighbour and decision tree. These algorithms are coded in R and executed in R studio. For performance analysis, various parameters such as specificity, sensitivity and accuracy have been calculated and compared. The SVM classifier gives the accuracy of 99.82% indicating its suitability over other classification techniques. In this work, we have addressed the issue of distinguishing benign from malignant breast cancer.


Breast cancer Classification accuracy SVM Machine learning classifiers Data set 


  1. 1.
  2. 2.
    International Agency for Research on Cancer, World Health Organization GLOBOCAN 2012—Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2012. Accessed 1 Apr 2018
  3. 3.
    V. Chaurasia, S. Pal, A novel approach for breast cancer detection using data mining techniques. Int. J. Innov. Res. Comput. Commun. Eng. 2(1) (2017)Google Scholar
  4. 4.
    S. Malvia, S.A. Bagadi, U.S. Dubey, S. Saxena, Epidemiology of breast cancer in Indian women. Asia Pac. J. Clin. Oncol. 13, 289–295 (2017)Google Scholar
  5. 5.
    World Health Organisation, The Global Burden of Disease (WHO, Geneva, 2009). 2004 UpdateGoogle Scholar
  6. 6.
    J. Cornfield, Joint dependence of the risk of coronary heart disease on serum cholesterol and systolic blood pressure: a discriminant function analysis. Proc. Fed. Am. Soc. Exp. Biol. 21, 58–61 (1962)Google Scholar
  7. 7.
    D. Cox, Some Procedures Associated with the Logistic Qualitative Response Curve (Wiley, New York, 1966)zbMATHGoogle Scholar
  8. 8.
    N. Day, D. Kerridge, A general maximum likelihood discriminant. Biometrics 23, 313–323 (1967)CrossRefGoogle Scholar
  9. 9.
    D.A. Salazar, J.I. Vélez, J.C. Salazar, Comparison between SVM and logistic regression: which one is better to discriminate? Rev. Col. Estadstica, 35, 223–237 (2012)Google Scholar
  10. 10.
    D. Hosmer, S. Lemeshow, Applied Logistic Regression (Wiley, New York, 1989)zbMATHGoogle Scholar
  11. 11.
    C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  12. 12.
    L.J. Cao, Support vector machines experts for time series forecasting. Neurocomputing (2002, in press)Google Scholar
  13. 13.
    C.J.C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 1–47 (1998)CrossRefGoogle Scholar
  14. 14.
    E. Osuna, R. Freund, F. Girosi, Training support vector machines: an application to face detection, in Proceedings of Computer Vision and Pattern Recognition (1997), pp. 130–136Google Scholar
  15. 15.
    T. Anderson, An Introduction to Multivariate Statistical Analysis (Wiley, New York, 1984)zbMATHGoogle Scholar
  16. 16.
    Breast Cancer Wisconsin (Original) Data Set (online). Available at: Accessed 25 Aug 2017

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021

Authors and Affiliations

  1. 1.Maharaja Surajmal Institute of TechnologyDelhiIndia

Personalised recommendations