Skip to main content

Fingerprinting Text in Logical Markup Languages

  • Conference paper
  • First Online:
Information Security (ISC 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2200))

Included in the following conference series:

Abstract

Information hiding is attracting an increasing attention from the research community. Most of this research has centered around hiding information, such as watermarks and fingerprints, in images or digital audio and video signals. Text has generally been treated as a black & white image with special properties. All of the current methods of hiding information in text are vulnerable to scanning followed by optical character recognition in order to reconstruct the text.

Document distribution is increasingly relying on logical markup languages like HTML and XML, where the physical presentation of the text is determined by the user’s browser. Embedding the watermark in the physical presentation of the document is therefore no longer practical. We argue that embedding syntactic or semantic fingerprints in text is the only viable way to fingerprint document in logical markup languages such as HTML or XML.

In this paper, we propose a new semantic fingerprinting mechanism based on synonymsubstitution. This idea is developed into an operational system and results of preliminary experiments are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Bender, d. Gruhl, N. Morimoto, and A. Lu. Techniques for data hiding. IBM Systems Journal, 35(3&4), 1996.

    Google Scholar 

  2. J. T. Brassil, S. H. Low, N. F. Maxemchuk, and L. O’Gorman. Electronic marking and identification techniques to discourage document copying. In Proceedings of Infocom, pages 1278–1287, Toronto, Ontario, Canada, June 1994.

    Google Scholar 

  3. T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0 (Second Edition). 2000.

    Google Scholar 

  4. S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms for digital documents. In Proceedings of the ACM SIGMOD Annual Conference, San Francisco, California, U.S.A., May 1995.

    Google Scholar 

  5. Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. SRC Technical Note 1997-015, DEC Systems Research Center (now COMPAQ), July 1997.

    Google Scholar 

  6. B. Chor, A. Fiat, and M. Naor. Tracing traitors. In Advances in Cryptology-CRYPTO ’94, volume 839 of Lecture Notes in Computer Science, pages 257–270. Springer Verlag, 1994.

    Google Scholar 

  7. C. S. Collberg and C. Thomborson. Software watermarking: Models and dynamic embeddings. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, Texas, U.S.A., January 1999.

    Google Scholar 

  8. J. H. Coombs, A. H. Renear, and S. J. DeRose. Markup systems and the future of scholarly text processing. Communications of the ACM, 30(11):933–947, November 1987.

    Article  Google Scholar 

  9. S. J. DeRose, D. G. Durand, E. Mylonas, and A. H. Renear. What is text, really? Journal of Computing in Higher Education, 1(2):3–26, 1990.

    Article  Google Scholar 

  10. C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, 1998.

    Google Scholar 

  11. Compris.com GmbH. Hide arbitrary data in texts through automatic text rephrasing with texthide! Available from: http://www.TextHide.com/ .

  12. Compris.com GmbH. Textsign-protect your texts with digital watermarks! Available from: http://www.TextSign.com/ .

  13. Project Gutenberg. Roget’s Thesaurus of EnglishWords and Phrases. The Crowell Company, 1911.

    Google Scholar 

  14. N. Heintze. Scalable document fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, Ca, U.S.A., 1996.

    Google Scholar 

  15. S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking and detection methods. IEEE Journal on Selected Areas in Communications, 16(4):561–572, May 1998.

    Article  Google Scholar 

  16. S. H. Low and N. F. Maxemchuk. Capacity of text marking channel. IEEE Signal Processing Letters, December 2000. To appear.

    Google Scholar 

  17. S. H. Low, N. F. Maxemchuk, and A. Lapone. Document identification for copyright protection using centroid detection. IEEE Transactions on Communications, 46(3):372–383, March 1998.

    Article  Google Scholar 

  18. N. F. Maxemchuk and S. H. Low. Marking text documents. In Proceedings of International Conference on Image Processing, Santa Barbara, California, U.S.A., 1997.

    Google Scholar 

  19. F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hiding — a survey. Proceedings of the IEEE, Special issue on protection of multimedia content, 87(7):1062–1078, July 1999.

    Google Scholar 

  20. D. Raggett, A. Le Hors, and I. Jacobs. HTML 4.01 Specification. 1999.

    Google Scholar 

  21. V. Raskin, M. Atallah, C. McDonough, and S. Nirenburg. Natural language processing for information assurance and security an overview and implementation. In Proceedings of the New Security Paradigms Workshop, pages 51–65, Ballycotton, Ireland, September 2000.

    Google Scholar 

  22. N. Shivakumar and H. Garcia-Molina. Scam: A copy detection mechanism for digital documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries, Austin, Texas, U.S.A., June 1995.

    Google Scholar 

  23. N. Shivakumar and H. Garcia-Molina. Building a scalable and accurate copy detection mechanism. In Proceedings of 1stACMConference on Digital Libraries, Bethesda, Maryland, U.S.A., March 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jensen, C.D. (2001). Fingerprinting Text in Logical Markup Languages. In: Davida, G.I., Frankel, Y. (eds) Information Security. ISC 2001. Lecture Notes in Computer Science, vol 2200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45439-X_30

Download citation

  • DOI: https://doi.org/10.1007/3-540-45439-X_30

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42662-2

  • Online ISBN: 978-3-540-45439-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics