An Alternating Least Square Based Algorithm for Predicting Patient Survivability

  • Qiming Hu
  • Jie YangEmail author
  • Khin Than WinEmail author
  • Xufeng Huang
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 996)


Breast cancer is the most common cancer to females worldwide. Using machine learning technology to predict breast-cancer patients’ survivability has drawn a lot of research interest. However, it still faces many issues, such as missing-value imputation. As such, the main objective of this paper is to develop a novel imputation algorithm, inspired by the recommendation system. More precisely, features with missing values are regarded as items to be evaluated for recommendation.

Consequently, a matrix factorisation algorithm (Alternating Least Square, ALS) is employed to replace missing values; accordingly, four different prediction strategies based on the ALS result are further discussed. The proposed ALS-based imputation algorithm is evaluated by using a large patient dataset from the Surveillance, Epidemiology, and End Results (SEER) program. Experimental results demonstrates a significant improvement on the survivability prediction, compared to existing methods.


SEER dataset Survivability prediction Missing-value imputation Alternating Least Square 


  1. 1.
    Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34, 113–127 (2005)CrossRefGoogle Scholar
  2. 2.
    Liu, Y.Q., Wang, C., Zhang, L.: Neural network based models for predicting breast cancer survivability. Chin. J. Biomed. Eng. 28, 221–225 (2009)Google Scholar
  3. 3.
    Solti, D., Zhai, H.: Predicting breast cancer patient survival using machine learning. In: ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics, BCB 2013, pp. 704–705. ACM (2013)Google Scholar
  4. 4.
    Lang, K.M., Little, T.D.: Principled missing data treatments. Prev. Sci. 19, 284–294 (2018). Scholar
  5. 5.
    Surveillance, Epidemiology, and End Results.
  6. 6.
    McGale, P., et al.: Effect of radiotherapy after mastectomy and axillary surgery on 10-year recurrence and 20-year breast cancer mortality: meta-analysis of individual patient data for 8135 women in 22 randomised trials. Lancet (London) 383, 2127–2135 (2014). Scholar
  7. 7.
    Jia, Y., Sun, C., Liu, Z., Wang, W., Zhou, X.: Primary breast diffuse large B-cell lymphoma: a population-based study from 1975 to 2014. Oncotarget 9, 3956–3967 (2018)Google Scholar
  8. 8.
    Agarwal, S., Pappas, L., Agarwal, J.: Association between unilateral or bilateral mastectomy and breast cancer death in patients with unilateral ductal carcinoma. Cancer Manag. Res. 9, 649–656 (2017)CrossRefGoogle Scholar
  9. 9.
    Webb-Robertson, B.J.M., et al.: Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J. Proteome Res. 14, 1993–2001 (2015). Scholar
  10. 10.
    Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016). Scholar
  11. 11.
    Brock, G.N., Shaffer, J.R., Blakesley, R.E., Lotz, M.J., Tseng, G.C.: Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes. BMC Bioinf. 9, 1–12 (2008). Scholar
  12. 12.
    Nguyen, L.T., Schmidt, H.A., von Haeseler, A., Minh, B.Q.: IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015). Scholar
  13. 13.
    Abaei, G., Selamat, A., Fujita, H.: An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction. Knowl.-Based Syst. 74, 28–39 (2015). Scholar
  14. 14.
    Shukla, N., Hagenbuchner, M., Win, K.T., Yang, J.: Breast cancer data analysis for survivability studies and prediction. Comput. Methods Programs Biomed. 155, 199–208 (2018). Scholar
  15. 15.
    Yamaguchi, Y., Misumi, T., Maruo, K.: A comparison of multiple imputation methods for incomplete longitudinal binary data. J. Biopharm. Stat. 28, 645–667 (2018). Scholar
  16. 16.
    Bian, Y., Li, H.: Recommendation system based on trusted relation transmission. In: 12th International Conference Intelligent Systems and Knowledge Engineering (ISKE), pp. 1–8. IEEE, November 2017.
  17. 17.
    Nguyen, J., Zhu, M.: Content boosted matrix factorization techniques for recommender systems. Stat. Anal. Data Min.: ASA Data Sci. J. 6, 286–301 (2013). Scholar
  18. 18.
    Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034, pp. 337–348. Springer, Heidelberg (2008). Scholar
  19. 19.
    Yang, J., Ma, J.: A structure optimization framework for feed-forward neural networks using sparse representation. Knowl.-Based Syst. 109, 61–70 (2016)CrossRefGoogle Scholar
  20. 20.
    Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.University of WollongongWollongongAustralia

Personalised recommendations