Skip to main content

Plagiarism Detection of Paraphrases in Text Documents with Document Retrieval

  • Conference paper
Book cover Advances in Computing and Information Technology (ACITY 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 198))

  • 1842 Accesses

Abstract

Retrieval of documents is used for finding relevant documents to user queries and plagiarism is the act of copying the contents of one’s work without any acknowledgement. Paraphrasing is a type of plagiarism where the contents from source may be changed. This paper proposes a new document retrieval system and paraphrase plagiarism detection of text documents using multi-layered self organizing map (MLSOM). In the proposed system tree structure is extracted for the document that hierarchically represents the document features as document, pages and paragraphs. To handle the tree-structured documents in an efficient way, MLSOM is used as a clustering algorithm. Using MLSOM the documents can be compared for detecting plagiarism and it finds out the local similarity. Paraphrased plagiarism can be detected by finding the similarity between sentences of two documents which is a kind of local similarity detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yates, Neto: Modern Information Retrieval, vol. 15, pp. 750–780. Addison- Wesley/Longman, Reading, MA (1999)

    Google Scholar 

  2. Zobel, Moffat: Exploring the similarity space. ACM SIGIR Forum 32(1), 18–34 (1998)

    Article  Google Scholar 

  3. Liu, Croft: Statistical Language Modeling for Information Retrieval. In: Cronin, B. (ed.) Annual Review of Information Science & Technology, vol. 38, pp. 556–567 (2004)

    Google Scholar 

  4. Lin, Y., Ye, H.: Input Data Representation for Self-Organizing Map in Software Classification. In: Second International Conference on Knowledge Acquisition and Modeling, Callaghan, Australia, pp. 163–195 (2009)

    Google Scholar 

  5. Kappe, Zaka: Plagiarism—A survey. Journal of Universal Computing 12(8), 1050–1084 (2006)

    Google Scholar 

  6. Kang, N., Gelbukh, A., Han, S.-Y.: PPChecker: Plagiarism pattern checker in document copy detection. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 661–667. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Heintze: Scalable document fingerprinting. In: Proc. 2nd USENIX Workshop Electron. Commerce, Oakland, CA, pp. 18–21 (November 2007)

    Google Scholar 

  8. Monostori, Zaslavsky, Schmidt: MatchDetectReveal: Finding overlapping and similar digital documents. In: Proc. of 21st Century Inf. Resources Manage. Assoc. Int. Conf. Challenges Inf. Technol. Manage., Anchorage, AK, pp. 955–957 (2000)

    Google Scholar 

  9. Weir, G.R.S., Gordon, M.A., Macgregor, G.: Technology in plagiarism detection and management. In: 34th ASEE/IEEE Frontiers in Education Conference, Savannah, GA, vol. 13, pp. 351–370 (2004)

    Google Scholar 

  10. Lintean, M.C., Rus, V.: Paraphrase Identification Using Weighted Dependencies and Word Semantics. Informatica 34, 19–28 (2010)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sandhya, S., Chitrakala, S. (2011). Plagiarism Detection of Paraphrases in Text Documents with Document Retrieval. In: Wyld, D.C., Wozniak, M., Chaki, N., Meghanathan, N., Nagamalai, D. (eds) Advances in Computing and Information Technology. ACITY 2011. Communications in Computer and Information Science, vol 198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22555-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22555-0_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22554-3

  • Online ISBN: 978-3-642-22555-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics