Software Clustering Using Automated Feature Subset Selection

Shah, Zubair; Naseem, Rashid; Orgun, Mehmet A.; Mahmood, Abdun; Shahzad, Sara

doi:10.1007/978-3-642-53917-6_5

Zubair Shah²⁵,
Rashid Naseem²⁶,
Mehmet A. Orgun²⁷,
Abdun Mahmood²⁸ &
…
Sara Shahzad²⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3203 Accesses
6 Citations

Abstract

This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Maqbool, O., Babri, H.A.: Hierarchical clustering for software architecture recovery. IEEE Transactions on Software Engineering 33(11), 759–780 (2007)
Article Google Scholar
Wang, Y., Liu, P., Guo, H., Li, H., Chen, X.: Improved hierarchical clustering algorithm for software architecture recovery. In: International Conference on Intelligent Computing and Cognitive Informatics, pp. 247–250 (2010)
Google Scholar
Fontana, F.A., Zanoni, M.: A tool for design pattern detection and software architecture reconstruction. Information Sciences 181(7), 1306–1324 (2011)
Article Google Scholar
Mitchell, B.S., Mancoridis, S.: On the automatic modularization of software systems using the BUNCH tool. IEEE Transactions on Software Engineering 32(3), 193–208 (2006)
Article Google Scholar
Andritsos, P., Tzerpos, V.: Information theoretic software clustering. IEEE Transactions on Software Engineering 31(2), 150–165 (2005)
Article Google Scholar
Cui, J., Chae, H.: Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Information and Software Technology 53(6), 601–614 (2011)
Article Google Scholar
Mahdavi, K., Harman, M., Hierons, R.: A multiple hill climbing approach to software module clustering. In: Proceedings of the International Conference on Software Maintenance, pp. 315–324 (2003)
Google Scholar
Saeed, M., Maqbool, O., Babri, H.A., Hassan, S., Sarwar, S.: Software clustering techniques and the use of combined algorithm. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 301–306 (2003)
Google Scholar
Naseem, R., Maqbool, O., Muhammad, S.: An improved similarity measure for binary features in software clustering. In: Proceedings of the International Conference on Computational Intelligence, Modelling and Simulation (CIMSim), pp. 111–116 (September 2010)
Google Scholar
Shtern, M., Tzerpos, V.: Clustering methodologies for software engineering. In: Advances in Software Engineering 2012 (2012)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Survey 31(3), 264–323 (1999)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2006)
Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics 28 (1979)
Google Scholar
Anquetil, N., Lethbridge, T.: Experiments with clustering as a software remodularization method. In: Proceedings of Sixth Working Conference on Reverse Engineering, pp. 235–255 (1999)
Google Scholar
Siraj, M., Maqbool, O., Abbasi, A.: Evaluating relationship categories for clustering object-oriented software systems. IET Software 6(1), 260–274 (2012)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley (2000)
Google Scholar
Shah, Z., Mahmood, A.N., Mustafa, A.K.: A Hybrid approach to improving clustering accuracy using SVM. In: 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA) (2013)
Google Scholar
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. PhD thesis, The University of Waikato (1999)
Google Scholar
Risi, M., Scanniello, G., Tortora, G.: Architecture recovery using latent semantic indexing and k-means: An empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp. 103–112 (2010)
Google Scholar
Scanniello, G., Risi, M., Tortora, G.: Architecture recovery using latent semantic indexing and k-means: an empirical evaluation. In: 2010 8th IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp. 103–112. IEEE (2010)
Google Scholar
Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering (CSMR), pp. 88–96 (2010)
Google Scholar
Corazza, A., Martino, S., Maggio, V., Scanniello, G.: Investigating the use of lexical information for software system clustering. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), pp. 35–44 (2011)
Google Scholar
Wiggerts, A.: Using clustering algorithms in legacy systems remodularization. In: Proceedings of the 4th Working Conference on Reverse Engineering, pp. 33–43 (1997)
Google Scholar
Maqbool, O., Babri, H.A.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 15–24 (2004)
Google Scholar
Naseem, R., Maqbool, O., Muhammad, S.: Improved similarity measures for software clustering. In: Proceedings of the European Conference on Software Maintenance and Reengineering, pp. 45–54 (March 2011)
Google Scholar
Naseem, R., Maqbool, O., Muhammad, S.: Cooperative clustering for software modularization. Journal of Systems and Software 20 (in press, 2013)
Google Scholar
Siddique, F., Maqbool, O.: Analyzing term weighting schemes for labeling software clusters. IET Software 6(3), 260–274 (2012)
Article Google Scholar
Abbasi, A.Q.: Application of appropriate machine learning techniques for automatic modularization of software systems. Mphil. thesis, Quaid–i–Azam University Islamabad (2008)
Google Scholar
Andreopoulos, B., An, A., Tzerpos, V., Wang, X.: Clustering large software systems at multiple layers. Information and Software Technology 49(3), 244–254 (2007)
Article Google Scholar
Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: Proceedings of 12th IEEE International Workshop on Program Comprehension, pp. 194–203 (2004)
Google Scholar
Abbes, M., Khomh, F., Guéhéneuc, Y., Antoniol, G.: An empirical study of the impact of two antipatterns, blob and spaghetti code, on program comprehension. In: 2011 15th European Conference on Software Maintenance and Reengineering (CSMR), pp. 181–190 (2011)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Venice, Italy
Zubair Shah
Dept. of Computer Science, City University of Science and I.T., Pakistan
Rashid Naseem
Department of Computing, Macquarie University, Sydney, Australia
Mehmet A. Orgun
University of New South Wales, Canberra, Australia
Abdun Mahmood
Department of Computer Science, University of Peshawar, Pakistan
Sara Shahzad

Authors

Zubair Shah
View author publications
You can also search for this author in PubMed Google Scholar
Rashid Naseem
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet A. Orgun
View author publications
You can also search for this author in PubMed Google Scholar
Abdun Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Sara Shahzad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, Edmonton, University of Alberta, T6G 2E8, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, Z., Naseem, R., Orgun, M.A., Mahmood, A., Shahzad, S. (2013). Software Clustering Using Automated Feature Subset Selection. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-53917-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics