Skip to main content

Efficient Algorithm for Detecting Parameterized Multiple Clones in a Large Software System

  • Conference paper
  • 1409 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6786))

Abstract

Two code fragments are said to be similar if they are similar in their program text or in their functionalities. The first kind of similarity can be detected with the help of parameterized string matching. In this type of matching, a given pattern P is said to match with a sub-string t of the text T, if there exists a bijection between the symbols of P and the symbols of t. The parameterized string matching problem has been efficiently solved by Fredriksson and Mozgovoy by using the shift-or (PSO) algorithm. The drawback of this algorithm is: it is unable to handle patterns of length greater than the word length (w) of a computer. In this paper, we solve this word length problem in a bit-parallel parameterized matching by extending the BLIM algorithm of exact string matching. Extended algorithm is also suitable for searching multiple patterns simultaneously. Experimentally, it has been observed that our algorithm is comparable with PSO for pattern length ≤ w and has ability to handle longer patterns efficiently.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Roy, C.K., Cordy, J.R.: A survey on clone detection research. Technical Report No. 2007-541, School of Computing, Queen’s University at Kingston, Ontario, Canada (2007)

    Google Scholar 

  2. Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Communication of ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  3. Baker, B.S.: Parameterized duplication in string: algorithm and application in software maintenance. SIAM J. Computing 26(5), 1343–1362 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  4. Baker, B.S.: Parameterized diff. In: 10th Symposium on Discrete Algorithm (SODA), pp. 854–855 (1999)

    Google Scholar 

  5. Boyer, R.S., Moore, J.S.: A fast string-searching algorithm. Communication of ACM 20(10), 762–772 (1977)

    Article  MATH  Google Scholar 

  6. Fredriksson, K., Mozgovoy, M.: Efficient parameterized string matching. Information Processing Letters (IPL) 100(3), 91–96 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Horspool, R.N.: Practical fast searching in strings, Software. Practice & Experience 10(6), 501–506 (1980)

    Article  Google Scholar 

  8. Prasad, R., Agarwal, S.: A new parameterized string matching algorithm by combining bit-parallelism and suffix automata. In: 8th IEEE International Conference on Computer and Information Technology, Sydney, Australia, pp. 778–783. IEEE Press, Los Alamitos (2008)

    Google Scholar 

  9. Raita, T.: Tuning the Boyer-Moore-Horspool string searching algorithm. Software - Practice & Experience 22(10), 879–884 (1992)

    Article  Google Scholar 

  10. Salmela, L., Tarhio, J.: Fast Parameterized Matching with q-grams. Journal of Discrete Algorithm 6(3), 408–419 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Smith, P.D.: Experiments with a very fast substring search algorithm. Software - Practice & Experience 21(10), 1065–1074 (1991)

    Article  Google Scholar 

  12. Sunday, D.M.: A very fast substring search algorithm. Communications of the ACM 33(8), 132–142 (1990)

    Article  Google Scholar 

  13. Wu, S., Manber, U.: Fast text searching allowing errors. Communication of the ACM 35(10), 83–91 (1992)

    Article  Google Scholar 

  14. Kulekci, M.O.: BLIM: A New Bit-Parallel Pattern Matching Algorithm Overcoming Computer Word Size Limitation. Mathematics in Computer Science 3(4), 407–420 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Navarro, G., Raffinot, M.: Fast and Flexible String Matching by Combining Bit-parallelism and Suffix automata. ACM Journal of Experimental Algorithms 5(4) (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Prasad, R., Agarwal, S., Sharma, A.K., Singh, A., Misra, S. (2011). Efficient Algorithm for Detecting Parameterized Multiple Clones in a Large Software System. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds) Computational Science and Its Applications - ICCSA 2011. ICCSA 2011. Lecture Notes in Computer Science, vol 6786. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21934-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21934-4_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21933-7

  • Online ISBN: 978-3-642-21934-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics