Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer

Hsu, Jia-Lien; Hung, Ping-Cheng; Lin, Hung-Yen; Hsieh, Chung-Ho

doi:10.1007/s10916-015-0210-x

Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer

Non-invasive Diagnostic Systems
Published: 25 February 2015

Volume 39, article number 40, (2015)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Jia-Lien Hsu¹,
Ping-Cheng Hung¹,
Hung-Yen Lin¹ &
…
Chung-Ho Hsieh²

885 Accesses
23 Citations
Explore all metrics

Abstract

Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heart Disease Prediction using Machine Learning Techniques

Article 16 October 2020

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A Review on Random Forest: An Ensemble Classifier

References

Siegel, R., Naishadham, D., Jemal, A., Cancer statistics, 2013. CA: Cancer J. Clin. 63(1):11–30, 2013. Available from: doi:10.3322/caac.21166.
Google Scholar
Kim, J., and Shin, H., Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data. J. Am. Med. Inform. Assoc. 20(4):613–618, 2013.
Article Google Scholar
Uhry, Z., Hédelin, G., Colonna, M., Asselain, B., Arveux, P., Rogel, A, et al., Multi-state Markov models in cancer screening evaluation: a brief review and case study. Stat. Methods Med. Res. 19(5):463–486, 2010.
Article MathSciNet Google Scholar
Bleyer, A., and Welch, H.G., Effect of three decades of screening mammography on breast-cancer incidence. N. Engl. J. Med. 367(21):1998–2005, 2012.
Article Google Scholar
Blume, J.D., Cormack, J.B., Mendelson, E.B., Lehrer, D., Pisano, E.D., Jong, R.A., et al., Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer. J. Am. Med. Assoc. 299(18):2151–2163, 2008.
Article Google Scholar
Lord, S.J., Lei, W., Craft, P., Cawson, J.N., Morris, I., Walleser, S., et al., A systematic review of the effectiveness of magnetic resonance imaging (MRI) as an addition to mammography and ultrasound in screening young women at high risk of breast cancer. Eur. J. Cancer 43 (13):1905–1917, 2007. Available from: http://www.sciencedirect.com/science/article/pii/S0959804907004844.
Article Google Scholar
Breast Cancer Screening (PDQ), Breast Cancer Screening Modalities Beyond Mammography (Health Professional Version) [homepage on the Internet]. National Cancer Institute; c2014 [updated 2014 Oct. 3; cited 2014 Oct. 6]. Available from: http://www.cancer.gov/cancertopics/pdq/screening/breast/healthprofessional/page9
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J., On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3):226–239, 1998. Available from: doi:10.1109/34.667881.
Article Google Scholar
Wolpert, D.H., Stacked generalization. Neural Netw. 5:241–259, 1992.
Article Google Scholar
Elkan, C.: The Foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2. IJCAI’01. Available from: http://dl.acm.org/citation.cfm?id=1642194.1642224, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco, CA (2001)
Seiffert, C., Khoshgoftaar, T.M., van Hulse, J., Napolitano A.: A Comparative Study of Data Sampling and Cost Sensitive Learning. In: Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, pp. 46–52 (2008)
Garca-Laencina, P., Sancho-Gmez, J.L., Figueiras-Vidal, A., Pattern classification with missing data: a review. Neural Comput. Applic. 19 (2):263–282, 2010. Available from: doi:10.1007/s00521-009-0295-6.
Article Google Scholar
Evangelopoulos, N.E., Latent semantic analysis. Wiley Interdiscip. Rev. Cogn. Sci. 4(6):683–692, 2013. doi:10.1002/wcs.1254.
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A., Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6):391–407, 1990.
Article Google Scholar
Fawcett, T., An introduction to, R O C analysis. Pattern Recognit. Lett. 27(8):861–874, 2006.
Article MathSciNet Google Scholar

Download references

Acknowledgments

Financial support for this study was provided in part by a grant from the National Science Council, Taiwan, under Contract No. NSC-102-2218-E-030-002. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Fu Jen Catholic University, New Tapei City, Taiwan, Republic of China
Jia-Lien Hsu, Ping-Cheng Hung & Hung-Yen Lin
Department of General Surgery, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan, Republic of China
Chung-Ho Hsieh

Authors

Jia-Lien Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Ping-Cheng Hung
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Yen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Ho Hsieh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia-Lien Hsu.

Additional information

This article is part of the Topical Collection on Systems-Level Quality Improvement

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, JL., Hung, PC., Lin, HY. et al. Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer. J Med Syst 39, 40 (2015). https://doi.org/10.1007/s10916-015-0210-x

Download citation

Received: 21 November 2014
Accepted: 10 December 2014
Published: 25 February 2015
DOI: https://doi.org/10.1007/s10916-015-0210-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer

Abstract

Access this article

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Review on Random Forest: An Ensemble Classifier

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer

Abstract

Access this article

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Review on Random Forest: An Ensemble Classifier

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation