Abstract
Breast cancer is one of the most common types of cancer among Jordanian women. Recently, healthcare organizations in Jordan have adopted electronic health records, which makes it feasible for researchers to access huge amounts of medical records. The goal of this study is to predict the recurrence of breast cancer using machine learning algorithms. We developed a Natural Language Processing algorithm to extract key features about breast cancer from medical records at King Abdullah University Hospital (KAUH) in Jordan. We integrated these features and built a medical dictionary for breast cancer. We applied multiple machine learning algorithms on the extracted information to predict the recurrence of breast cancer in patients. Our predicted results were approved by specialist physicians from KAUH. The medical dictionary was created and the accuracy of the data had been validated by targeted users (physicians, researchers). This dictionary can be used for personalized medicine. All machine learning algorithms had a nice performance. OneR algorithm has the best balance of sensitivity and specificity. The medical dictionary will help physicians to choose the most appropriate treatment plan in a short time. The machine learning prediction results can help physicians to make the correct clinical decision regarding their treatment options.
This is a preview of subscription content, access via your institution.



References
- 1.
Abdel-Razeq H, Attiga F, Mansour A (2015) Cancer care in Jordan. Hematol Oncol Stem Cell Ther 8(2):64–70
- 2.
Abualigah L (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Studies in computational intelligence. Springer International Publishing, Berlin
- 3.
Abualigah L (2020) Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput Applic 32:12381–12401
- 4.
Abualigah L, Khader A (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:4773–4795. https://doi.org/10.1007/s11227-017-2046-2
- 5.
Ahmad L, Eshlaghy A, Poorebrahimi A, Ebrahimi M, Razavi A (2013) Using three machine learning techniques for predicting breast cancer recurrence. J Health Med Inform 4(2). https://doi.org/10.4172/2157-7420.1000124
- 6.
Al-Adwan A, Berger H (2015) Exploring physicians’ behavioural intention toward the toward the adoption of electronic health records. Int J Healthc Technol. Manag 15(2):89–111
- 7.
Alzu’bi A, Zhou L, Watzlaf V (2014) Personal genomic information management and personalized medicine: challenges, current solutions, and roles of HIM professionals. Perspect Health Inf Manag 11(Spring):1c
- 8.
Amin M et al (2017) The eighth edition ajcc cancer staging manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J Clin 67(2):93–99
- 9.
Bagaria S et al (2014) Personalizing breast cancer staging by the inclusion of ER, PR, and HER2. JAMA Surg 149(2):125–9
- 10.
Bakre M et al (2019) Clinical validation of an immunohistochemistry-based canassist-breast test for distant recurrence prediction in hormone receptor-positive breast cancer patients. Cancer Med 8(4):1755–1764
- 11.
Battineni G et al (2020) Applications of machine learning predictive models in the chronic disease diagnosis. J Perinat Med 10(2):21
- 12.
Boeri C et al (2020) Machine Learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Med 9(9):3234–3243
- 13.
Chae S, Kwon S, Lee D (2018) Predicting infectious disease using deep learning and big data. Int J Environ Res Public Health 15(8):1596
- 14.
Chang C, Chen S (2019) Developing a novel machine learning-based classification scheme for predicting spcs in breast cancer survivors. Front Genet 10(848). https://doi.org/10.3389/fgene.2019.00848
- 15.
Chung S et al (2019) Prognostic factors predicting recurrence in in- vasive breast cancer: An analysis of radiological and clinicopathological factors. Asian J Surg 42(5):613–620
- 16.
Dahiwade D, Patle G, Meshram E (2019) Designing disease prediction model using machine learning approach, in 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, New York
- 17.
Dawes T et al (2017) Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: a cardiac MR imaging study. Radiology 283(2):381–390
- 18.
Eidemüller M et al (2019) Long-term health risk after breast-cancer radiotherapy: overview of passos methodology and software. Radiat Prot Dosim 183:259–263
- 19.
Falck A, Fernö M, Bendahl P, Rydén L (2013) St Gallen molecular subtypes in primary breast cancer and matched lymph node metastases–aspects on distribution and prognosis for patients with luminal A tumours: results from a prospective randomised trial. BMC Cancer 13(558). https://doi.org/10.1186/1471-2407-13-558
- 20.
Feliciano E et al (2017) Body mass index, pam50 subtype, recurrence, and survival among patients with nonmetastatic breast cancer. Cancer 123(13):2535–2542
- 21.
Filipits M et al (2011) A new molecular predictor of distant recurrence in er-positive, her2-negative breast cancer adds independent information to conventional clinical risk factors. Clin Cancer Res 17(18):6012–6020
- 22.
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 23(5):1007–1015
- 23.
Gerhard W. The diagnosis, pathology, and treatment of the diseases of the chestchest. Philadelphia: E. Barrington and G.D. Haswell, 1850. http://resource.nlm.nih.gov/101505669
- 24.
Guo J, Sun Z, Tang H, Jia X, Wang S, Yan X, Ye G, Wu G (2016) Hybrid optimization algorithm of particle swarm optimization and cuckoo search for preventive maintenance period optimization. Discret Dyn Nat Soc. https://doi.org/10.1155/2016/1516271
- 25.
Hardavella J et al (2017) Top tips to deal with challenging situations: doctor–patient interactions. Breathe 13(2):129–135
- 26.
Hong W et al (2011) SVR with Hybrid chaotic immune algorithm for seasonal load demand forecasting. Energies 4:960–977
- 27.
Huang E et al (2003) Gene expression predictors of breast cancer outcomes. Lancet 361(9369):1590–1596
- 28.
Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69
- 29.
Lafourcade A et al (2018) Factors associated with breast cancer recurrences or mortality and dynamic prediction of death using history of cancer recurrences: the french e3n cohort. BMC Cancer 18(1):171
- 30.
Meric F et al (2003) Positive surgical margins and ipsilateral breast tumor recurrence predict disease-specific survival after breast-conserving therapy. Cancer 97(4):926–933
- 31.
Meystre S, Haug P (2006) Natural language processing to extract medical problems from electronic clinical documents: performance evaluation. J Biomed Inform 39(6):589–599
- 32.
Partridge S et al (2005) MRI measurements of breast tumor volume predict response to neoadjuvant chemotherapy and recurrence-free survival. Am J Roentgenol 184(6):1774–1781
- 33.
Sada Y et al (2016) Validation of case finding algorithms for hepatocellular cancer from administrative data and electronic health records using natural language processing. Med Care 54(2):e9-14
- 34.
Sharma H, Rizvi M (2017) Prediction of heart disease using machine learning algorithms: A survey. Int J Recent Innov Trends Comput Commun 5(8):99–104
- 35.
Shim H et al (2014) Breast cancer recurrence according to molecular subtype. Asian Pac J Cancer Prev 15(14):5539–44
- 36.
Song W et al (2012) The risk factors influencing between the early and late recurrence in systemic recurrent breast cancer. J Breast Cancer 15(2):218–223
- 37.
Stenkvist B et al (1982) Predicting breast cancer recurrence. Cancer 50(15):2884–2893
- 38.
Tseng Y et al (2019) Predicting breast cancer metastasis by using serum biomarkers and clinicopathological data with machine learning technologies. Int J Med Inform 128:79–86
- 39.
Vinitha S, Hao Y, Hwang K, Wang Lu, Wang Li (2019) Disease prediction by machine learning over big data from healthcare communities. Comput Sci Eng 8(1). https://doi.org/10.1109/ACCESS.2017.2694446
- 40.
Young I, Luz S, Lone N (2019) A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis. Int J Med Inform 132(103971). https://doi.org/10.1016/j.ijmedinf.2019.103971
- 41.
Yousefi M et al (2018) Organ-specific metastasis of breast cancer: molecular and cellular mechanisms underlying lung metastasis. Cell Oncol 41(2):123–140
- 42.
Zhang Z, Hong W, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14642–14658
- 43.
Zhou M et al (2016) Discovery of potential prognostic long non-coding rna biomarkers for predicting the risk of tumor recurrence of breast cancer patients. Sci Rep 6(3):1038
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alzu’bi, A., Najadat, H., Doulat, W. et al. Predicting the recurrence of breast cancer using machine learning algorithms. Multimed Tools Appl (2021). https://doi.org/10.1007/s11042-020-10448-w
Received:
Revised:
Accepted:
Published:
Keywords
- Machine learning
- Natural language processing
- Healthcare
- Breast cancer