Identifying Association between Longer Itemsets and Software Defects

  • Zeeshan A. Rana
  • Sehrish Abdul Malik
  • Shafay Shamail
  • Mian M. Awais
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8228)


Software defects are an indicator of software quality. Software with lesser number of defective modules are desired. Prediction of software defects using software measurements facilitates early identification of defect-prone modules. Association relationship between software measures and defects improves prediction of defective modules. To find association relationship between software measures and defects, each numeric measure is divided into bins. Each bin is called 1-itemset (or an itemset of length 1). When certain itemsets and defective modules appear together in a dataset, they are considered associated with each other. Frequency of their co-occurrence depicts the strength of the association relationship. Existing studies find the relationship between 1-itemsets and defective modules. Itemsets that have high association with defects are called focused itemsets. Focused itemsets can be used to build prediction models with higher Recall values. This paper explores the relationship between defective modules and itemsets with length greater than 1. Focused itemsets with length greater than 1 involve multiple bins at same time. Identification of the focused itemsets has improved the performance of decision tree based defect prediction model.


Frequent Itemsets Defect Prediction Software Defect Software Measure Association Relationship 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anwar, S., Rana, Z.A., Shamail, S., Awais, M.M.: Using association rules to identify similarities between software datasets. In: The 8th International Conference on the Quality of Information and Communications Technology (QUATIC), pp. 114–119. IEEE Computer Society, Lisbon (2012)Google Scholar
  2. 2.
    Baojun, M., Dejaeger, K., Vanthienen, J., Baesens, B.: Software defect prediction based on association rule classification. Open Access publications from Katholieke Universiteit Leuven urn:hdl:123456789/296322, Katholieke Universiteit Leuven (February 2011)Google Scholar
  3. 3.
    Boetticher, G., Menzies, T., Ostrand, T.: Promise repository of empirical software engineering data (2007)Google Scholar
  4. 4.
    Challagulla, V.U.B., Bastani, F.B., Paul, R.A.: Empirical assessment of machine learning based sofwtare defect prediction techniques. In: Proceedings of 10th Workshop on Object-Oriented Real-Time Dependable Systems (WORDS 2005), pp. 263–270. IEEE Computer Society, Washington, DC (2005)Google Scholar
  5. 5.
    Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Transactions on Software Engineering 25(5), 675–687 (1999)CrossRefGoogle Scholar
  6. 6.
    Jiawei, H., Micheline, K.: Data Mining - Concepts and Techniques. Morgan Kaufmann (2002)Google Scholar
  7. 7.
    Kamei, Y., Monden, A., Morisaki, S., Matsumoto, K.-I.: A hybrid faulty module prediction using association rule mining and logistic regression analysis. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2008, pp. 279–281. ACM, New York (2008)CrossRefGoogle Scholar
  8. 8.
    Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. Software Engineering, IEEE Transactions on 33(1), 2–13 (2007)CrossRefGoogle Scholar
  9. 9.
    Rana, Z.A., Shamail, S., Awais, M.M.: Towards a generic model for software quality prediction. In: WoSQ 2008: Proceedings of the 6th International Workshop on Software Quality, pp. 35–40. ACM (May 2008)Google Scholar
  10. 10.
    Song, Q., Shepperd, M., Cartwright, M., Mair, C.: Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering 32(2), 69–82 (2006)CrossRefGoogle Scholar
  11. 11.
    Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: The waikato environment for knowledge analysis, weka (2008)Google Scholar
  12. 12.
    Zafar, H., Rana, Z.A., Shamail, S., Awais, M.M.: Finding focused itemsets from software defect data. In: Proceedings of the 15th International Multitopic Conference (INMIC 2012). IEEE (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Zeeshan A. Rana
    • 1
  • Sehrish Abdul Malik
    • 1
  • Shafay Shamail
    • 1
  • Mian M. Awais
    • 1
  1. 1.LUMS School of Science and Engineering LahorePakistan

Personalised recommendations