Abstract
cluster is a tool to partition a large pool of C programs into groups according to structural similarity. Its method involves calculating an alignment score for each program against a mosaic made of randomly selected code fragments of fixed size from the pool. The scores are then grouped together so that the distance between two adjacent members of a group is at most some threshold value. cluster is effective in identifying tight clusters of similar programs and is capable of distributing its workload over a network of workstations to achieve very fast running times. As a tool, cluster is highly configurable: the user can adjust its alignment scoring scheme and clustering threshold as well as obtain visual alignments of programs suspected to be similar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cormen, T., Leiserson, C., and Rivest, R. Introduction to Algorithms. MIT Press and McGraw Hill, 1992.
Gitchell, D., and Tran, N. Sim: A utility for detecting similarity in computer programs. SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education) 31 (1999).
Hirschberg, D. A linear space algorithm for computing maximal commonsubse-quences. Communications of the ACM 18 (1975), 341–343.
Huang, X., Hardison, R. C., and Miller, W. A space-efficient algorithm for local similarities. Computer Applications in the Biosciences 6,4 (1990), 373–381.
Hunt, J. W., and Szymanski, T. G. A fast algorithm for computing longest common subsequences. Communications of the ACM 20,5 (May 1977), 350–353.
Myers, E. W., and Miller, W. Optimal alignments in linear space. Computer Applications in the Biosciences 4,1 (1988), 11–17.
Smith, T. F., and Waterman, M. S. Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981), 195–197.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carter, C., Tran, N. (2002). Cluster: A Fast Tool to Identify Groups of Similar Programs. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_22
Download citation
DOI: https://doi.org/10.1007/3-540-45655-4_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43996-7
Online ISBN: 978-3-540-45655-1
eBook Packages: Springer Book Archive