Abstract
In this paper, we introduce an incremental approach to semantic clustering, designed for software visualization, inspired by behavior of fire ant colony. Our technique focus on identification of equally sized but natural clusters that provides better hindsight of software system structure for development participants. We also address performance issues of existing approaches by maintaining similarities based on global weights incrementally, using subspaces and covariance matrix. Effectivity of visualization is improved by representing multiple documents with precise medoid approximation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
DeLine, R., Rowan, K.: Code canvas: zooming towards better development environments. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE 2010), vol. 2, pp. 207–210. ACM, New York (2010)
Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE 2010), vol. 1, pp. 95–104. ACM, New York (2010)
Kuhn, A., Ducasse, S., Gírba, T.: Semantic clustering: identifying topics in source code. Inf. Softw. Technol. 49(3), 230–243 (2007)
Uhlár, M., Polasek, I.: Extracting, identifiyng and visualisation of the content in software projects. In: Proceedings of the 4th World Congress on Nature and Biologically Inspired Computing (NaBIC 2012), November 2012, pp. 72–78. IEEE Press (2012)
Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., Baldi, P.: Mining concepts from code with probabilistic topic models. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), pp. 461–464. ACM, New York (2007)
Blei, D. M., Ng, A. Y., Jordan, M. I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (March 2003), 993–1022March
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: StreamKM++: a clustering algorithm for data streams. J. Exp. Algorithmics 17, 1–31 (2012). Article 2.4
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)
Grygorash, O., Zhou, Y., Jorgensen, Z.: Minimum spanning tree based clustering algorithms. In: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2006), pp. 73–81. IEEE Computer Society, Washington, DC (2006)
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18, 54–64 (1969)
Jafar, O.M., Sivakumar, R.: Ant-based clustering algorithms a brief survey. Int. J. Comput. Theor. Eng. 2(5), 787–796 (2010)
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 20(1), 68–86 (1971)
Mlot, N.J., Tovey, C.A., Hu, D.L.: Fire ants self-assemble into waterproof rafts to survive floods. Proc. Natl. Acad. Sci. USA 108(19), 7669–7673 (2011)
Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pp. 1386–1395. Association for Computational Linguistics, Stroudsburg (2010)
Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.-M., Irwin, J.: Aspect-oriented programming. In: Akşit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997). doi:10.1007/BFb0053381
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 315–323. ACM, New York (1998)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1996), pp. 21–29. ACM, New York (1996)
Nešetřil, J., Milková, E., Nešetřilová, H.: Otakar Borůvka on minimum spanning tree problem translation of both the 1926 papers, comments, history. Discrete Math. 233, 1–3, 3–36 (2001)
Polášek, I., Uhlár, M.: Extracting, identifying and visualisation of the content, users and authors in software projects. In: Gavrilova, M.L., Tan, C.J.K., Abraham, A. (eds.) Transactions on Computational Science XXI. LNCS, vol. 8160, pp. 269–295. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45318-2_12
Gregorovic, L., Polasek, I.: Analysis and design of object-oriented software using multidimensional UML. In: Proceedings of the 15th International Conference on Knowledge Technologies and Data-Driven Business (i-KNOW 2015). ACM, New York (2015)
Gregorovič, L., Polasek, I., Sobota, B.: Software model creation with multidimensional UML. In: Khalil, I., Neuhold, E., Tjoa, A.M., Da Xu, L., You, I. (eds.) CONFENIS/ICT-EurAsia -2015. LNCS, vol. 9357, pp. 343–352. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24315-3_35
Polasek, I., et al.: Information and knowledge retrieval within software projects and their graphical representation for collaborative programming. Acta Polytech. Hung. 10(2), 173–192 (2013)
Acknowledgments
This work was supported by the Scientific Grant Agency of Slovak Republic (VEGA) under the grant No. VG 1/1221/12. This contribution is also a partial result of the Research & Development Operational Programme for the project Research of Methods for Acquisition, Analysis and Personalized Conveying of Information and Knowledge, ITMS 26240220039, co-funded by the ERDF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Vincúr, J., Polášek, I. (2017). An Incremental Approach to Semantic Clustering Designed for Software Visualization. In: Janech, J., Kostolny, J., Gratkowski, T. (eds) Proceedings of the 2015 Federated Conference on Software Development and Object Technologies. SDOT 2015. Advances in Intelligent Systems and Computing, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-319-46535-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-46535-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46534-0
Online ISBN: 978-3-319-46535-7
eBook Packages: EngineeringEngineering (R0)