A novel under sampling strategy for efficient software defect analysis of skewed distributed data

  • K. Nitalaksheswara RaoEmail author
  • Ch. Satyananda Reddy
Original Paper


The software quality development process is a continuous process which starts by identifying a reliable fault detection technique. The implementation of the effective fault detection technique depends on the properties of the dataset in terms of domain information, characteristics of input data, complexity, etc. The early detection of defective modules provide more time for the developers to allocate resources effectively to deliver the quality software in time. The class imbalance nature of the software defect datasets indicates that the existing techniques are unsuccessful for identifying all the defective modules. Misclassification of the defective modules in the software engineering datasets invites unexpected loses to the software developers. To classify the class imbalance software datasets in an efficient way, we have proposed a novel approach called as under sampling strategy. This proposed approach uses under sampling strategy to reduce the less prominent instances from majority subset. The experimental results confirm that the proposed approach can deliver more accuracy in predicting the modules which are error prone with less and simple rules.


Software defects analysis Classification Decision tree Class imbalance learning Under sampling 



  1. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185. MathSciNetCrossRefGoogle Scholar
  2. Alvarez JL, Mata (2004) J:Data mining for the management of software development process. Int J Softw Eng Knowl Eng 14:665CrossRefGoogle Scholar
  3. Anupama D, Kaberi D, Puthal B (2011) Improving software development process through data mining techniques embedding alitheia core tool. (IJCSIT). Int J Comput Sci Inf Technol 2(2):629–632Google Scholar
  4. Barnabé Lortie V, Bellinger C, Japkowicz N (2015) Active learning for OneClass classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 390–395Google Scholar
  5. Beatriz P, Oscar F, Noelia SM (2015) Selecting target concept in one-class classification for handling class imbalance problem. In: International joint conference on neural networks (IJCNN) 2015, July 12–July 17, pp 1–8Google Scholar
  6. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth, BelmontzbMATHGoogle Scholar
  7. Burak T, Gozde K, Ayse B (2009) Data mining source code for locating software bugs: a case study in telecommunication industry. Expert Syst Appl 36:9986–9990CrossRefGoogle Scholar
  8. Ceren S, Ahin G, Hasan S (2017) Automated refinement of models for model-based testing using exploratory testing. Softw Qual J. CrossRefGoogle Scholar
  9. de Jesus Rubio J (2009) SOFMLS: online self-organizing fuzzy modified least-squares network. Fuzzy Syst IEEE Trans 17:1296–1309. CrossRefGoogle Scholar
  10. de J Rubio, J (2018) Error convergence analysis of the SUFIN and CSUFIN. Appl Soft Comput 72:587–595. CrossRefGoogle Scholar
  11. de Jesus Rubio J, Lughofer E, Meda Campaña J, Paramo Carranza L, Francisco Novoa J, Pacheco J (2018) Neural network updating via argument Kalman filter for modeling of Takagi–Sugeno fuzzy models. J Intell Fuzzy Syst 35:2585–2596. CrossRefGoogle Scholar
  12. Haibo H, Edwardo AG (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  13. Hall MA (1998) Correlation-based feature subset selection for machine learning. HamiltonGoogle Scholar
  14. Lakshi T, Prasad Ch (2014) A study on classifying imbalanced datasets. In: Proc. international conference on networks & soft computing (ICNSC2014), pp 141–145Google Scholar
  15. Lin C, Bin F, Zhaowei S, Yuanyan T (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J. CrossRefGoogle Scholar
  16. Liu N, Woon WL, Aung Z, Afshari A (2014) Handling class imbalance in customer behavior prediction. In: Proc. 2014 IEEE international conference on collaboration technologies and systems, pp 100–103Google Scholar
  17. Lov K, Rath SK (2017) Empirical validation for effectiveness of fault prediction technique based on cost analysis framework. Int J Syst Assur Eng Manag. CrossRefGoogle Scholar
  18. Lovedeep, Varinder KA (2014) Applications of data mining techniques in software engineering. Int J Electr Electron Comput Syst (IJEECS) 2(5, 6):2347–2820Google Scholar
  19. Maimon O, Rokach L (2010) Data mining and knowledge discovery handbook. Springer, BerlinCrossRefGoogle Scholar
  20. Meda Campaña J (2018) Estimation of complex systems with parametric uncertainties using a JSSF heuristically adjusted. IEEE Latin Am Trans 16:350–357. CrossRefGoogle Scholar
  21. Meda-Campaña JA, Grande-Meza A, de Jesús Rubio J, Tapia-Herrera R, Hernández-Cortés T, Curtidor-López A, Páramo-Carranza LA, Cázares-Ramírez IO (2018) Design of stabilizers and observers for a class of multivariable T–S fuzzy models on the basis of new interpolation functions. IEEE Trans Fuzzy Syst 26(5):2649–2662CrossRefGoogle Scholar
  22. Naheed A, Shazia U (2011) Defect prediction leads to high quality product. J Softw Eng Appl 4:639–645. CrossRefGoogle Scholar
  23. Padmabhushana D, Srikanth D (2012) Predicting software bugs using web log analysis techniques and naïve bayesian technique. Int J Comput Trends Technol 3(1):185–191Google Scholar
  24. Puneet JK, Pallavi M (2014) Data mining techniques for software defect prediction. Int J Softw Web Sci (IJSWS) 3:54–57Google Scholar
  25. Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  26. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann Publishers, BurlingtonGoogle Scholar
  27. Rao KN, Reddy ChS (2018) An efficient software defect analysis using correlation-based oversampling. Arab J Sci Eng. CrossRefGoogle Scholar
  28. Safia Y (2014) Software bug detection algorithm using data mining techniques. Int J Innov Res Adv Eng 1(5):105–108Google Scholar
  29. Shuhua L, Thomas F (2015) Text classification models for web content filtering and online safety. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 961–968Google Scholar
  30. Shuo W, Xin Y (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443Google Scholar
  31. Wahidah H, Pey VL, Lee KN, Zhen LO (2011) Application of data mining techniques for improving software engineering. IN: ICIT 2011 the 5th international conference on information technologyGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • K. Nitalaksheswara Rao
    • 1
    Email author
  • Ch. Satyananda Reddy
    • 1
  1. 1.Department of Computer Science and Systems EngineeringAndhra UniversityVisakhapatnamIndia

Personalised recommendations