Skip to main content

Partial Plagiarism Detection Using String Matching with Mismatches

  • Conference paper
Informatics Engineering and Information Science (ICIEIS 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 254))

Abstract

In recent years, many documents are created as an electronic one and are distributed. Although those costs were reduced remarkably, the copy of a document could also be created easily. Spreading of plagiarism or violation of copyright is the big issue which controls production of a valuable document. Therefore, the system which detects plagiarism is very important. Many plagiarism detection systems have aimed to detect a document chiefly similar to query. However, it is not easy to detect a partially similar document. When the document with the possibility to plagiarize or to be plagiarized is not given, the detection of a similar document by mutual comparisons of all documents is more difficult. We propose the method that detects partial copies from documents without query. Some partial copies were detected from test documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atallah, M., Chyzak, F., Dumas, P.: A randomized algorithm for approximate string matching. Algorithmica 29(3), 468–486 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baba, K., Shinohara, A., Takeda, M., Inenaga, S., Arikawa, S.: A note on randomized algorithm for string matching with mismatches. Nordic Journal of Computing 10, 2–12 (2003)

    MathSciNet  MATH  Google Scholar 

  3. Baba, K., Tanaka, Y., Nakatoh, T., Shinohara, A.: A generalization of FFT algorithm for string matching. In: Proc. of International Symposium on Information Science and Electrical Engineering, pp. 191–194 (2003)

    Google Scholar 

  4. Brin, S., Davis, J., Garcia-Molina, H.: Copy detection mechanisms for digital documents. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pp. 398–409. ACM (1995)

    Google Scholar 

  5. Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8-13), 1157–1166 (1997)

    Article  Google Scholar 

  6. Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388. ACM (2002)

    Google Scholar 

  7. Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press, Inc., New York (1994)

    MATH  Google Scholar 

  8. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2002)

    Google Scholar 

  9. Fischer, M.J., Paterson, M.S.: String-matching and other products. In: Proceedings of the SIAM-AMS Applied Mathematics Symposium, pp. 113–125. Massachusetts Institute of Technology, Cambridge (1974)

    Google Scholar 

  10. Gusfield, D.: Algorithms on strings, trees, and sequences. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  11. Henzinger, M.: Finding near-duplicate web pages: a large-scale evaluation of algorithms. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 284–291. ACM (2006)

    Google Scholar 

  12. Jun-Peng, B., Jun-Yi, S., Xiao-Dong, L., Hai-Yan, L., Xiao-Di, Z.: Document copy detection based on kernel method. In: International Conference on Natural Language Processing and Knowledge Engineering, 2003, pp. 250–255. IEEE (2003)

    Google Scholar 

  13. Kang, N., Gelbukh, A., Han, S.-Y.: PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 661–667. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Lancaster, T., Culwin, F.: Classifications of plagiarism detection engines. ITALICS, 4 (2005)

    Google Scholar 

  15. Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies, pp. 1–6. ACM (2007)

    Google Scholar 

  16. Lyon, C., Malcolm, J., Dickerson, B.: Detecting short passages of similar text in large document collections. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, Citeseer, pp. 118–125 (2001)

    Google Scholar 

  17. Monostori, K., Zaslavsky, A., Schmidt, H.: Document overlap detection system for distributed digital libraries. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 226–227. ACM (2000)

    Google Scholar 

  18. Nakatoh, T., Baba, K., Ikeda, D., Mori, M., Hirokawa, S.: Accuracy evaluation of FFT-based randomized algorithms for string matching with mismatches. IPSJ Transactions on Databases (TOD) 2(4), 24–31 (2009) (in Japanese)

    Google Scholar 

  19. Nakatoh, T., Baba, K., Mori, M., Hirokawa, S.: An optimal mapping for score of string matching with FFT. DBSJ Letters 6(3), 25–28 (2007) (in Japanese)

    Google Scholar 

  20. Schoenmeyr, T., Zhang, D.Y.: FFT-based algorithms for the string matching with mismatches problem. Journal of Algorithms 57(2), 130–139 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Shivakumar, N., Garcia-Molina, H.: Scam: A copy detection mechanism for digital documents. In: Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D. (2011). Partial Plagiarism Detection Using String Matching with Mismatches. In: Abd Manaf, A., Sahibuddin, S., Ahmad, R., Mohd Daud, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25483-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25483-3_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25482-6

  • Online ISBN: 978-3-642-25483-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics