Predicting the recurrence of breast cancer using machine learning algorithms


Breast cancer is one of the most common types of cancer among Jordanian women. Recently, healthcare organizations in Jordan have adopted electronic health records, which makes it feasible for researchers to access huge amounts of medical records. The goal of this study is to predict the recurrence of breast cancer using machine learning algorithms. We developed a Natural Language Processing algorithm to extract key features about breast cancer from medical records at King Abdullah University Hospital (KAUH) in Jordan. We integrated these features and built a medical dictionary for breast cancer. We applied multiple machine learning algorithms on the extracted information to predict the recurrence of breast cancer in patients. Our predicted results were approved by specialist physicians from KAUH. The medical dictionary was created and the accuracy of the data had been validated by targeted users (physicians, researchers). This dictionary can be used for personalized medicine. All machine learning algorithms had a nice performance. OneR algorithm has the best balance of sensitivity and specificity. The medical dictionary will help physicians to choose the most appropriate treatment plan in a short time. The machine learning prediction results can help physicians to make the correct clinical decision regarding their treatment options.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Abdel-Razeq H, Attiga F, Mansour A (2015) Cancer care in Jordan. Hematol Oncol Stem Cell Ther 8(2):64–70

    Article  Google Scholar 

  2. 2.

    Abualigah L (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Studies in computational intelligence. Springer International Publishing, Berlin

  3. 3.

    Abualigah L (2020) Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput Applic 32:12381–12401

    Article  Google Scholar 

  4. 4.

    Abualigah L, Khader A (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:4773–4795.

  5. 5.

    Ahmad L, Eshlaghy A, Poorebrahimi A, Ebrahimi M, Razavi A (2013) Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform 4(2).

  6. 6.

    Al-Adwan A, Berger H (2015) Exploring physicians’ behavioural intention toward the toward the adoption of electronic health records. Int J Healthc Technol. Manag 15(2):89–111

    Article  Google Scholar 

  7. 7.

    Alzu’bi A, Zhou L, Watzlaf V (2014) Personal genomic information management and personalized medicine: challenges, current solutions, and roles of HIM professionals. Perspect Health Inf Manag 11(Spring):1c

    Google Scholar 

  8. 8.

    Amin M et al (2017) The eighth edition ajcc cancer staging manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J Clin 67(2):93–99

    Article  Google Scholar 

  9. 9.

    Bagaria S et al (2014) Personalizing breast cancer staging by the inclusion of ER, PR, and HER2. JAMA Surg 149(2):125–9

    Article  Google Scholar 

  10. 10.

    Bakre M et al (2019) Clinical validation of an immunohistochemistry-based canassist-breast test for distant recurrence prediction in hormone receptor-positive breast cancer patients. Cancer Med 8(4):1755–1764

    Article  Google Scholar 

  11. 11.

    Battineni G et al (2020) Applications of machine learning predictive models in the chronic disease diagnosis. J Perinat Med 10(2):21

    Google Scholar 

  12. 12.

    Boeri C et al (2020) Machine Learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Med 9(9):3234–3243

    Article  Google Scholar 

  13. 13.

    Chae S, Kwon S, Lee D (2018) Predicting infectious disease using deep learning and big data. Int J Environ Res Public Health 15(8):1596

  14. 14.

    Chang C, Chen S (2019) Developing a novel machine learning-based classification scheme for predicting spcs in breast cancer survivors. Front Genet 10(848).

  15. 15.

    Chung S et al (2019) Prognostic factors predicting recurrence in in- vasive breast cancer: An analysis of radiological and clinicopathological factors. Asian J Surg 42(5):613–620

    Article  Google Scholar 

  16. 16.

    Dahiwade D, Patle G, Meshram E (2019) Designing disease prediction model using machine learning approach, in 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, New York

  17. 17.

    Dawes T et al (2017) Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: a cardiac MR imaging study. Radiology 283(2):381–390

    MathSciNet  Article  Google Scholar 

  18. 18.

    Eidemüller M et al (2019) Long-term health risk after breast-cancer radiotherapy: overview of passos methodology and software. Radiat Prot Dosim 183:259–263

    Article  Google Scholar 

  19. 19.

    Falck A, Fernö M, Bendahl P, Rydén L (2013) St Gallen molecular subtypes in primary breast cancer and matched lymph node metastases–aspects on distribution and prognosis for patients with luminal A tumours: results from a prospective randomised trial. BMC Cancer 13(558).

  20. 20.

    Feliciano E et al (2017) Body mass index, pam50 subtype, recurrence, and survival among patients with nonmetastatic breast cancer. Cancer 123(13):2535–2542

    Article  Google Scholar 

  21. 21.

    Filipits M et al (2011) A new molecular predictor of distant recurrence in er-positive, her2-negative breast cancer adds independent information to conventional clinical risk factors. Clin Cancer Res 17(18):6012–6020

    Article  Google Scholar 

  22. 22.

    Ford E, Carroll JA, Smith HE, Scott D, Cassell JA (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 23(5):1007–1015

    Article  Google Scholar 

  23. 23.

    Gerhard W. The diagnosis, pathology, and treatment of the diseases of the chestchest. Philadelphia: E. Barrington and G.D. Haswell, 1850.

  24. 24.

    Guo J, Sun Z, Tang H, Jia X, Wang S, Yan X, Ye G, Wu G (2016) Hybrid optimization algorithm of particle swarm optimization and cuckoo search for preventive maintenance period optimization. Discret Dyn Nat Soc.

  25. 25.

    Hardavella J et al (2017) Top tips to deal with challenging situations: doctor–patient interactions. Breathe 13(2):129–135

    Article  Google Scholar 

  26. 26.

    Hong W et al (2011) SVR with Hybrid chaotic immune algorithm for seasonal load demand forecasting. Energies 4:960–977

    Article  Google Scholar 

  27. 27.

    Huang E et al (2003) Gene expression predictors of breast cancer outcomes. Lancet 361(9369):1590–1596

    Article  Google Scholar 

  28. 28.

    Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69

    Google Scholar 

  29. 29.

    Lafourcade A et al (2018) Factors associated with breast cancer recurrences or mortality and dynamic prediction of death using history of cancer recurrences: the french e3n cohort. BMC Cancer 18(1):171

    Article  Google Scholar 

  30. 30.

    Meric F et al (2003) Positive surgical margins and ipsilateral breast tumor recurrence predict disease-specific survival after breast-conserving therapy. Cancer 97(4):926–933

    Article  Google Scholar 

  31. 31.

    Meystre S, Haug P (2006) Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J Biomed Inform 39(6):589–599

    Article  Google Scholar 

  32. 32.

    Partridge S et al (2005) MRI measurements of breast tumor volume predict response to neoadjuvant chemotherapy and recurrence-free survival. Am J Roentgenol 184(6):1774–1781

    Article  Google Scholar 

  33. 33.

    Sada Y et al (2016) Validation of case finding algorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Med Care 54(2):e9-14

    Article  Google Scholar 

  34. 34.

    Sharma H, Rizvi M (2017) Prediction of heart disease using machine learning algorithms: A survey. Int J Recent Innov Trends Comput Commun 5(8):99–104

    Google Scholar 

  35. 35.

    Shim H et al (2014) Breast cancer recurrence according to molecular subtype. Asian Pac J Cancer Prev 15(14):5539–44

    Article  Google Scholar 

  36. 36.

    Song W et al (2012) The risk factors influencing between the early and late recurrence in systemic recurrent breast cancer. J Breast Cancer 15(2):218–223

    Article  Google Scholar 

  37. 37.

    Stenkvist B et al (1982) Predicting breast cancer recurrence. Cancer 50(15):2884–2893

    Article  Google Scholar 

  38. 38.

    Tseng Y et al (2019) Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int J Med Inform 128:79–86

    Article  Google Scholar 

  39. 39.

    Vinitha S, Hao Y, Hwang K, Wang Lu, Wang Li (2019) Disease prediction by machine learning over big data from healthcare communities. Comput Sci Eng 8(1).

  40. 40.

    Young I, Luz S, Lone N (2019) A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis. Int J Med Inform 132(103971).

  41. 41.

    Yousefi M et al (2018) Organ-specific metastasis of breast cancer: molecular and cellular mechanisms underlying lung metastasis. Cell Oncol 41(2):123–140

    Article  Google Scholar 

  42. 42.

    Zhang Z, Hong W, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14642–14658

    Article  Google Scholar 

  43. 43.

    Zhou M et al (2016) Discovery of potential prognostic long non-coding rna biomarkers for predicting the risk of tumor recurrence of breast cancer patients. Sci Rep 6(3):1038

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Amal Alzu’bi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alzu’bi, A., Najadat, H., Doulat, W. et al. Predicting the recurrence of breast cancer using machine learning algorithms. Multimed Tools Appl (2021).

Download citation


  • Machine learning
  • Natural language processing
  • Healthcare
  • Breast cancer