Abstract
This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Maqbool, O., Babri, H.A.: Hierarchical clustering for software architecture recovery. IEEE Transactions on Software Engineering 33(11), 759–780 (2007)
Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010)
Fontana, F.A., Zanoni, M.: A tool for design pattern detection and software architecture reconstruction. Information Sciences 181(7), 1306–1324 (2011)
Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the BUNCH tool. IEEE Transactions on Software Engineering 32(3), 193–208 (2006)
Andritsos, P., Tzerpos, V.: Information theoretic software clustering. IEEE Transactions on Software Engineering 31(2), 150–165 (2005)
Cui, J., Chae, H.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Information and Software Technology 53(6), 601–614 (2011)
Mahdavi, K., Harman, M., Hierons, R.: A multiple hill climbing approach to software module clustering. In: Proceedings of the International Conference on Software Maintenance, pp. 315–324 (2003)
Saeed, M., Maqbool, O., Babri, H.A., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 301–306 (2003)
Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: Proceedings of the International Conference on Computational Intelligence, Modelling and Simulation (CIMSim), pp. 111–116 (September 2010)
Shtern, M., Tzerpos, V.: Clustering methodologies for software engineering. In: Advances in Software Engineering 2012 (2012)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Survey 31(3), 264–323 (1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2006)
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics 28 (1979)
Anquetil, N., Lethbridge, T.: Experiments with clustering as a software remodularization method. In: Proceedings of Sixth Working Conference on Reverse Engineering, pp. 235–255 (1999)
Siraj, M., Maqbool, O., Abbasi, A.: Evaluating relationship categories for clustering object-oriented software systems. IET Software 6(1), 260–274 (2012)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2000)
Shah, Z., Mahmood, A.N., Mustafa, A.K.: A Hybrid approach to improving clustering accuracy using SVM. In: 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA) (2013)
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. PhD thesis, The University of Waikato (1999)
Risi, M., Scanniello, G., Tortora, G.: Architecture recovery using latent semantic indexing and k-means: An empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp. 103–112 (2010)
Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and k-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp. 103–112. IEEE (2010)
Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering (CSMR), pp. 88–96 (2010)
Corazza, A., Martino, S., Maggio, V., Scanniello, G.: Investigating the use of lexical information for software system clustering. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), pp. 35–44 (2011)
Wiggerts, A.: Using clustering algorithms in legacy systems remodularization. In: Proceedings of the 4th Working Conference on Reverse Engineering, pp. 33–43 (1997)
Maqbool, O., Babri, H.A.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 15–24 (2004)
Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 45–54 (March 2011)
Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. Journal of Systems and Software 20 (in press, 2013)
Siddique, F., Maqbool, O.: Analyzing term weighting schemes for labeling software clusters. IET Software 6(3), 260–274 (2012)
Abbasi, A.Q.: Application of appropriate machine learning techniques for automatic modularization of software systems. Mphil. thesis, Quaid–i–Azam University Islamabad (2008)
Andreopoulos, B., An, A., Tzerpos, V., Wang, X.: Clustering large software systems at multiple layers. Information and Software Technology 49(3), 244–254 (2007)
Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings of 12th IEEE International Workshop on Program Comprehension, pp. 194–203 (2004)
Abbes, M., Khomh, F., Guéhéneuc, Y., Antoniol, G.: An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), pp. 181–190 (2011)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shah, Z., Naseem, R., Orgun, M.A., Mahmood, A., Shahzad, S. (2013). Software Clustering Using Automated Feature Subset Selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)