Skip to main content

Software Clustering Using Automated Feature Subset Selection

  • Conference paper
Advanced Data Mining and Applications (ADMA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

Abstract

This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Maqbool, O., Babri, H.A.: Hierarchical clustering for software architecture recovery. IEEE Transactions on Software Engineering 33(11), 759–780 (2007)

    Article  Google Scholar 

  2. Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010)

    Google Scholar 

  3. Fontana, F.A., Zanoni, M.: A tool for design pattern detection and software architecture reconstruction. Information Sciences 181(7), 1306–1324 (2011)

    Article  Google Scholar 

  4. Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the BUNCH tool. IEEE Transactions on Software Engineering 32(3), 193–208 (2006)

    Article  Google Scholar 

  5. Andritsos, P., Tzerpos, V.: Information theoretic software clustering. IEEE Transactions on Software Engineering 31(2), 150–165 (2005)

    Article  Google Scholar 

  6. Cui, J., Chae, H.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Information and Software Technology 53(6), 601–614 (2011)

    Article  Google Scholar 

  7. Mahdavi, K., Harman, M., Hierons, R.: A multiple hill climbing approach to software module clustering. In: Proceedings of the International Conference on Software Maintenance, pp. 315–324 (2003)

    Google Scholar 

  8. Saeed, M., Maqbool, O., Babri, H.A., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 301–306 (2003)

    Google Scholar 

  9. Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: Proceedings of the International Conference on Computational Intelligence, Modelling and Simulation (CIMSim), pp. 111–116 (September 2010)

    Google Scholar 

  10. Shtern, M., Tzerpos, V.: Clustering methodologies for software engineering. In: Advances in Software Engineering 2012 (2012)

    Google Scholar 

  11. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Survey 31(3), 264–323 (1999)

    Article  Google Scholar 

  12. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2006)

    Google Scholar 

  13. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics 28 (1979)

    Google Scholar 

  14. Anquetil, N., Lethbridge, T.: Experiments with clustering as a software remodularization method. In: Proceedings of Sixth Working Conference on Reverse Engineering, pp. 235–255 (1999)

    Google Scholar 

  15. Siraj, M., Maqbool, O., Abbasi, A.: Evaluating relationship categories for clustering object-oriented software systems. IET Software 6(1), 260–274 (2012)

    Google Scholar 

  16. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2000)

    Google Scholar 

  17. Shah, Z., Mahmood, A.N., Mustafa, A.K.: A Hybrid approach to improving clustering accuracy using SVM. In: 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA) (2013)

    Google Scholar 

  18. Hall, M.A.: Correlation-based Feature Selection for Machine Learning. PhD thesis, The University of Waikato (1999)

    Google Scholar 

  19. Risi, M., Scanniello, G., Tortora, G.: Architecture recovery using latent semantic indexing and k-means: An empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp. 103–112 (2010)

    Google Scholar 

  20. Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and k-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp. 103–112. IEEE (2010)

    Google Scholar 

  21. Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering (CSMR), pp. 88–96 (2010)

    Google Scholar 

  22. Corazza, A., Martino, S., Maggio, V., Scanniello, G.: Investigating the use of lexical information for software system clustering. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), pp. 35–44 (2011)

    Google Scholar 

  23. Wiggerts, A.: Using clustering algorithms in legacy systems remodularization. In: Proceedings of the 4th Working Conference on Reverse Engineering, pp. 33–43 (1997)

    Google Scholar 

  24. Maqbool, O., Babri, H.A.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 15–24 (2004)

    Google Scholar 

  25. Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 45–54 (March 2011)

    Google Scholar 

  26. Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. Journal of Systems and Software 20 (in press, 2013)

    Google Scholar 

  27. Siddique, F., Maqbool, O.: Analyzing term weighting schemes for labeling software clusters. IET Software 6(3), 260–274 (2012)

    Article  Google Scholar 

  28. Abbasi, A.Q.: Application of appropriate machine learning techniques for automatic modularization of software systems. Mphil. thesis, Quaid–i–Azam University Islamabad (2008)

    Google Scholar 

  29. Andreopoulos, B., An, A., Tzerpos, V., Wang, X.: Clustering large software systems at multiple layers. Information and Software Technology 49(3), 244–254 (2007)

    Article  Google Scholar 

  30. Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings of 12th IEEE International Workshop on Program Comprehension, pp. 194–203 (2004)

    Google Scholar 

  31. Abbes, M., Khomh, F., Guéhéneuc, Y., Antoniol, G.: An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), pp. 181–190 (2011)

    Google Scholar 

  32. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shah, Z., Naseem, R., Orgun, M.A., Mahmood, A., Shahzad, S. (2013). Software Clustering Using Automated Feature Subset Selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53917-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53916-9

  • Online ISBN: 978-3-642-53917-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics