Abstract
Two code fragments are said to be similar if they are similar in their program text or in their functionalities. The first kind of similarity can be detected with the help of parameterized string matching. In this type of matching, a given pattern P is said to match with a sub-string t of the text T, if there exists a bijection between the symbols of P and the symbols of t. The parameterized string matching problem has been efficiently solved by Fredriksson and Mozgovoy by using the shift-or (PSO) algorithm. The drawback of this algorithm is: it is unable to handle patterns of length greater than the word length (w) of a computer. In this paper, we solve this word length problem in a bit-parallel parameterized matching by extending the BLIM algorithm of exact string matching. Extended algorithm is also suitable for searching multiple patterns simultaneously. Experimentally, it has been observed that our algorithm is comparable with PSO for pattern length ≤ w and has ability to handle longer patterns efficiently.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Roy, C.K., Cordy, J.R.: A survey on clone detection research. Technical Report No. 2007-541, School of Computing, Queen’s University at Kingston, Ontario, Canada (2007)
Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Communication of ACM 35(10), 74–82 (1992)
Baker, B.S.: Parameterized duplication in string: algorithm and application in software maintenance. SIAM J. Computing 26(5), 1343–1362 (1997)
Baker, B.S.: Parameterized diff. In: 10th Symposium on Discrete Algorithm (SODA), pp. 854–855 (1999)
Boyer, R.S., Moore, J.S.: A fast string-searching algorithm. Communication of ACM 20(10), 762–772 (1977)
Fredriksson, K., Mozgovoy, M.: Efficient parameterized string matching. Information Processing Letters (IPL) 100(3), 91–96 (2006)
Horspool, R.N.: Practical fast searching in strings, Software. Practice & Experience 10(6), 501–506 (1980)
Prasad, R., Agarwal, S.: A new parameterized string matching algorithm by combining bit-parallelism and suffix automata. In: 8th IEEE International Conference on Computer and Information Technology, Sydney, Australia, pp. 778–783. IEEE Press, Los Alamitos (2008)
Raita, T.: Tuning the Boyer-Moore-Horspool string searching algorithm. Software - Practice & Experience 22(10), 879–884 (1992)
Salmela, L., Tarhio, J.: Fast Parameterized Matching with q-grams. Journal of Discrete Algorithm 6(3), 408–419 (2008)
Smith, P.D.: Experiments with a very fast substring search algorithm. Software - Practice & Experience 21(10), 1065–1074 (1991)
Sunday, D.M.: A very fast substring search algorithm. Communications of the ACM 33(8), 132–142 (1990)
Wu, S., Manber, U.: Fast text searching allowing errors. Communication of the ACM 35(10), 83–91 (1992)
Kulekci, M.O.: BLIM: A New Bit-Parallel Pattern Matching Algorithm Overcoming Computer Word Size Limitation. Mathematics in Computer Science 3(4), 407–420 (2010)
Navarro, G., Raffinot, M.: Fast and Flexible String Matching by Combining Bit-parallelism and Suffix automata. ACM Journal of Experimental Algorithms 5(4) (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prasad, R., Agarwal, S., Sharma, A.K., Singh, A., Misra, S. (2011). Efficient Algorithm for Detecting Parameterized Multiple Clones in a Large Software System. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds) Computational Science and Its Applications - ICCSA 2011. ICCSA 2011. Lecture Notes in Computer Science, vol 6786. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21934-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-21934-4_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21933-7
Online ISBN: 978-3-642-21934-4
eBook Packages: Computer ScienceComputer Science (R0)