PDD Algorithm for Balancing Medical Data

Kalra, Karan; Goyal, Riya; Kaur, Sanmeet; Kumar, Parteek

doi:10.1007/978-981-13-1810-8_26

PDD Algorithm for Balancing Medical Data

Karan Kalra¹⁴,
Riya Goyal¹⁴,
Sanmeet Kaur¹⁴ &
…
Parteek Kumar¹⁴

Conference paper
First Online: 31 October 2018

1054 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 905))

Abstract

There can be various aspects that can affect the performance of a machine learning classifier, among which the unbalanced dataset is the most prominent. The unbalanced dataset is the one in which there is a disproportion among classes i.e. instances belonging to the one class heavily outnumber instances belonging to all other classes. This problem of the unbalanced dataset is more common in medical data as it is collected from the real world where the number of persons affected by the disease will always be less than the non-affected persons. Due to this disproportion among the classes, classifiers face difficulties in learning concepts related to the class in minority. Most of all data balancing techniques are created keeping general data in mind and are not viable for medical data. In this paper, a method is proposed that helps balance medical data more effectively and at the same time increase performance and decrease the leaning time for the classifier.

K. Kalra and R. Goyal—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Liu, T.: Feature selection based on mutual information for gear faultydiagnosis on the imbalanced dataset. J. Comput. Inf. Syst. 8(18), 7831–7838 (2012)
Google Scholar
Mena, L., Gonzalez, J.A.: Symbolic one-class learning from imbalanced datasets: application in medical diagnosis. Int. J. Artif. Intell. Tools 18(02), 273–309 (2009)
Article Google Scholar
Cieslak, D.A., Chawla, N.V., Striegel, A.: Combating imbalance in network intrusion datasets. In: GrC, pp. 732–737, May 2006
Google Scholar
Thomas, C.: Improving intrusion detection for imbalanced network traffic. Secur. Commun. Netw. 6(3), 309–324 (2013)
Article MathSciNet Google Scholar
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)
Article Google Scholar
Li, Y., Sun, G., Zhu, Y.: Data imbalance problem in text classification. In: 2010 Third International Symposium on Information Processing (ISIP), pp. 301–305. IEEE, October 2010
Google Scholar
Perols, J.: Financial statement fraud detection: an analysis of statistical and machine learning algorithms. Audit. J. Pract. Theory 30(2), 19–50 (2011)
Article Google Scholar
Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Ensemble of online neural networks for non-stationary and imbalanced data streams. Neurocomputing 122, 535–544 (2013)
Article Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Article Google Scholar
Qian, Y., Liang, Y., Li, M., Feng, G., Shi, X.: A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing 143, 57–67 (2014)
Article Google Scholar
Pearson, R., Goney, G., Shwaber, J.: Imbalanced clustering for microarray time-series. In: Proceedings of the ICML, vol. 3 (2003)
Google Scholar
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 592–602. IEEE, December 2006
Google Scholar
Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: International Joint Conference on Neural Networks, IJCNN 2006, pp. 1770–1775. IEEE, July 2006
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Google Scholar
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
MathSciNet MATH Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Article MathSciNet Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008. IEEE World Congress on Computational Intelligence, pp. 1322–1328. IEEE, June 2008
Google Scholar
Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11, pp. 1–8. Citeseer, Washington DC, August 2003
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, TIET, Patiala, 147001, India
Karan Kalra, Riya Goyal, Sanmeet Kaur & Parteek Kumar

Authors

Karan Kalra
View author publications
You can also search for this author in PubMed Google Scholar
Riya Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Sanmeet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Parteek Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Karan Kalra or Riya Goyal .

Editor information

Editors and Affiliations

University of KwaZulu-Natal, Durban, South Africa
Mayank Singh
Jaypee University of Information Technology, Solan, India
P. K. Gupta
Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
Vipin Tyagi
Institute of Information Theory and Automation, Prague 8, Czech Republic
Jan Flusser
University of Ottawa, Ottawa, Canada
Tuncer Ören

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalra, K., Goyal, R., Kaur, S., Kumar, P. (2018). PDD Algorithm for Balancing Medical Data. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T. (eds) Advances in Computing and Data Sciences. ICACDS 2018. Communications in Computer and Information Science, vol 905. Springer, Singapore. https://doi.org/10.1007/978-981-13-1810-8_26

Download citation

DOI: https://doi.org/10.1007/978-981-13-1810-8_26
Published: 31 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1809-2
Online ISBN: 978-981-13-1810-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics