Cluster: A Fast Tool to Identify Groups of Similar Programs

Carter, Casey; Tran, Nicholas

doi:10.1007/3-540-45655-4_22

Casey Carter⁶ &
Nicholas Tran⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2387))

Included in the following conference series:

International Computing and Combinatorics Conference

562 Accesses

Abstract

cluster is a tool to partition a large pool of C programs into groups according to structural similarity. Its method involves calculating an alignment score for each program against a mosaic made of randomly selected code fragments of fixed size from the pool. The scores are then grouped together so that the distance between two adjacent members of a group is at most some threshold value. cluster is effective in identifying tight clusters of similar programs and is capable of distributing its workload over a network of workstations to achieve very fast running times. As a tool, cluster is highly configurable: the user can adjust its alignment scoring scheme and clustering threshold as well as obtain visual alignments of programs suspected to be similar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cormen, T., Leiserson, C., and Rivest, R. Introduction to Algorithms. MIT Press and McGraw Hill, 1992.
Google Scholar
Gitchell, D., and Tran, N. Sim: A utility for detecting similarity in computer programs. SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education) 31 (1999).
Google Scholar
Hirschberg, D. A linear space algorithm for computing maximal commonsubse-quences. Communications of the ACM 18 (1975), 341–343.
Article MATH MathSciNet Google Scholar
Huang, X., Hardison, R. C., and Miller, W. A space-efficient algorithm for local similarities. Computer Applications in the Biosciences 6,4 (1990), 373–381.
Google Scholar
Hunt, J. W., and Szymanski, T. G. A fast algorithm for computing longest common subsequences. Communications of the ACM 20,5 (May 1977), 350–353.
Google Scholar
Myers, E. W., and Miller, W. Optimal alignments in linear space. Computer Applications in the Biosciences 4,1 (1988), 11–17.
Google Scholar
Smith, T. F., and Waterman, M. S. Identification of common molecular subsequences. Journal of Molecular Biology 147 (1981), 195–197.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Casey Carter
Department of Mathematics & Computer Science, Santa Clara University, Santa Clara, CA, 95053-0290, USA
Nicholas Tran

Authors

Casey Carter
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Tran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Santa Barbara, California, 93106, USA
Oscar H. Ibarra
Department of Mathematics, National University of Singapore, Singapore, Singapore, 117543
Louxin Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carter, C., Tran, N. (2002). Cluster: A Fast Tool to Identify Groups of Similar Programs. In: Ibarra, O.H., Zhang, L. (eds) Computing and Combinatorics. COCOON 2002. Lecture Notes in Computer Science, vol 2387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45655-4_22

Download citation

DOI: https://doi.org/10.1007/3-540-45655-4_22
Published: 29 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43996-7
Online ISBN: 978-3-540-45655-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics