Reducing Computational Effort for Plagiarism Detection with Approximate String Matching

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 700)

Abstract

Currently, a large number of documents are created as digital material and distributed world-wide. Digital materials are easy to publish and copy at a remarkably low cost. As a result, many documents are copied illegally, and this practice is spreading, making plagiarism a significant social issue. Therefore, the need to develop systems that detect plagiarism is very high. We have developed a new plagiarism detection method that compares documents by using approximate string matching to detect plagiarism. We have also developed a technique that reduces the computational time of the comparison method. In this paper, we demonstrate our proposed method’s usefulness through experiments and through the measuring indexes of precision and recall.

Keywords

Plagiarism detection Approximate string matching FFT 

Notes

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number 15K00426.

References

  1. 1.
    Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies. ACM, 2007, pp. 1–6Google Scholar
  2. 2.
    Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Partial plagiarism detection using string matching with mismatches. In: Proceedings of Informatics Engineering and Information Science, Springer CCIS254, pp. 265–272 (2011)Google Scholar
  3. 3.
    Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Speed improvement of the plagiarism detection method. In: Proceedings of IIAI International Conference on Advanced Information Technologies 2013 (IIAI AIT 2013), 2013, P38.pdfGoogle Scholar
  4. 4.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)CrossRefMATHGoogle Scholar
  5. 5.
    Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Inc., New York, NY, USA (1994)MATHGoogle Scholar
  6. 6.
    Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2002)Google Scholar
  7. 7.
    Fischer, M.J., Paterson, M.S.: String-matching and other products. In Proceedings of the SIAM-AMS Applied Mathematics Symposium. Massachusetts Institute of Technology Cambridge, MA, USA, pp. 113–125 (1974)Google Scholar
  8. 8.
    Atallah, M., Chyzak, F., Dumas, P.: A randomized algorithm for approximate string matching. Algorithmica 29(3), 468–486 (2001)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Baba, K., Shinohara, A., Takeda, M., Inenaga, S., Arikawa, S.: A note on randomized algorithm for string matching with mismatches. Nordic J. Comput. 10, 2–12 (2003)MathSciNetMATHGoogle Scholar
  10. 10.
    Nakatoh, T., Baba, K., Ikeda, D., Yamada, Y., Hirokawa, S.: An efficient mapping for computing the score of string matching. J. Automata Lang. Combinat. 10(5/6), 697–704 (2005)MathSciNetMATHGoogle Scholar
  11. 11.
    Schoenmeyr, T., Zhang, D.Y.: FFT-based algorithms for the string matching with mismatches problem. J. Algorithms 57(2), 130–139 (2005)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Baba, K., Tanaka, Y., Nakatoh, T., Shinohara, A.: A generalization of FFT algorithm for string matching. In: Proceedings of International Symposium on Information Science and Electrical Engineering, pp. 191–194 (2003)Google Scholar
  13. 13.
    Nakatoh, T., Baba, K., Ikeda, D., Mori, M., Hirokawa, S.: Accuracy evaluation of FFT-based randomized algorithms for string matching with mismatches (in Japanese). IPSJ Trans. Databases (TOD) 2(4), 24–31 (2009)Google Scholar
  14. 14.
    Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2012 Evaluation Labs, eds. by Forner, P., Karlgren, J., Womser-Hacker, C., Sept. 2012Google Scholar
  15. 15.
    Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2013 Evaluation Labs, eds. by Forner, P., Navigli, R., Tufis, D. Sep. 2013Google Scholar
  16. 16.
    Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 6th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2014 Evaluation Labs, ser. CEUR Workshop Proceedings, L. Cappellato, N. Ferro, M. Halvey, W. Kraaij, Eds. CLEF and CEUR-WS.org, Sep. 2014Google Scholar
  17. 17.
    L. Kong, H. Qi, S. Wang, C. Du, S. Wang, and Y. Han, “Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection-Notebook for PAN at CLEF 2012,” in CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy, P. Forner, J. Karlgren, and C. Womser-Hacker, Eds., Sep. 2012Google Scholar
  18. 18.
    Rodríguez Torrejon, D., Martín Ramos, J.: Text alignment module in CoReMo 2.1 plagiarism detector-notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop Working Notes Papers, 23–26 September, Valencia, Spain, eds. by Forner, P., Navigli, R., Tufis, D. Sept. 2013Google Scholar
  19. 19.
    Sanchez-Perez, M., Sidorov, G., Gelbukh, A.: A winning approach to text alignment for text reuse detection at PAN 2014 Notebook for PAN at CLEF 2014. In: CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers, 15–18 September, Sheffield, UK, ser. CEUR Workshop Proceedings, eds. by Cappellato, L., Ferro, N., Halvey, M., Kraaij, W., CEUR-WS.org, Sept. 2014Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Research Institute for Information Technology, Kyushu UniversityFukuokaJapan
  2. 2.Kyushu Institute of Information SciencesFukuokaJapan

Personalised recommendations