Fingerprinting Text in Logical Markup Languages

Jensen, Christian D.

doi:10.1007/3-540-45439-X_30

Christian D. Jensen⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2200))

Included in the following conference series:

International Conference on Information Security

560 Accesses
3 Citations

Abstract

Information hiding is attracting an increasing attention from the research community. Most of this research has centered around hiding information, such as watermarks and fingerprints, in images or digital audio and video signals. Text has generally been treated as a black & white image with special properties. All of the current methods of hiding information in text are vulnerable to scanning followed by optical character recognition in order to reconstruct the text.

Document distribution is increasingly relying on logical markup languages like HTML and XML, where the physical presentation of the text is determined by the user’s browser. Embedding the watermark in the physical presentation of the document is therefore no longer practical. We argue that embedding syntactic or semantic fingerprints in text is the only viable way to fingerprint document in logical markup languages such as HTML or XML.

In this paper, we propose a new semantic fingerprinting mechanism based on synonymsubstitution. This idea is developed into an operational system and results of preliminary experiments are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W. Bender, d. Gruhl, N. Morimoto, and A. Lu. Techniques for data hiding. IBM Systems Journal, 35(3&4), 1996.
Google Scholar
J. T. Brassil, S. H. Low, N. F. Maxemchuk, and L. O’Gorman. Electronic marking and identification techniques to discourage document copying. In Proceedings of Infocom, pages 1278–1287, Toronto, Ontario, Canada, June 1994.
Google Scholar
T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0 (Second Edition). 2000.
Google Scholar
S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms for digital documents. In Proceedings of the ACM SIGMOD Annual Conference, San Francisco, California, U.S.A., May 1995.
Google Scholar
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. SRC Technical Note 1997-015, DEC Systems Research Center (now COMPAQ), July 1997.
Google Scholar
B. Chor, A. Fiat, and M. Naor. Tracing traitors. In Advances in Cryptology-CRYPTO ’94, volume 839 of Lecture Notes in Computer Science, pages 257–270. Springer Verlag, 1994.
Google Scholar
C. S. Collberg and C. Thomborson. Software watermarking: Models and dynamic embeddings. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, Texas, U.S.A., January 1999.
Google Scholar
J. H. Coombs, A. H. Renear, and S. J. DeRose. Markup systems and the future of scholarly text processing. Communications of the ACM, 30(11):933–947, November 1987.
Article Google Scholar
S. J. DeRose, D. G. Durand, E. Mylonas, and A. H. Renear. What is text, really? Journal of Computing in Higher Education, 1(2):3–26, 1990.
Article Google Scholar
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, 1998.
Google Scholar
Compris.com GmbH. Hide arbitrary data in texts through automatic text rephrasing with texthide! Available from: http://www.TextHide.com/ .
Compris.com GmbH. Textsign-protect your texts with digital watermarks! Available from: http://www.TextSign.com/ .
Project Gutenberg. Roget’s Thesaurus of EnglishWords and Phrases. The Crowell Company, 1911.
Google Scholar
N. Heintze. Scalable document fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, Ca, U.S.A., 1996.
Google Scholar
S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking and detection methods. IEEE Journal on Selected Areas in Communications, 16(4):561–572, May 1998.
Article Google Scholar
S. H. Low and N. F. Maxemchuk. Capacity of text marking channel. IEEE Signal Processing Letters, December 2000. To appear.
Google Scholar
S. H. Low, N. F. Maxemchuk, and A. Lapone. Document identification for copyright protection using centroid detection. IEEE Transactions on Communications, 46(3):372–383, March 1998.
Article Google Scholar
N. F. Maxemchuk and S. H. Low. Marking text documents. In Proceedings of International Conference on Image Processing, Santa Barbara, California, U.S.A., 1997.
Google Scholar
F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hiding — a survey. Proceedings of the IEEE, Special issue on protection of multimedia content, 87(7):1062–1078, July 1999.
Google Scholar
D. Raggett, A. Le Hors, and I. Jacobs. HTML 4.01 Specification. 1999.
Google Scholar
V. Raskin, M. Atallah, C. McDonough, and S. Nirenburg. Natural language processing for information assurance and security an overview and implementation. In Proceedings of the New Security Paradigms Workshop, pages 51–65, Ballycotton, Ireland, September 2000.
Google Scholar
N. Shivakumar and H. Garcia-Molina. Scam: A copy detection mechanism for digital documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries, Austin, Texas, U.S.A., June 1995.
Google Scholar
N. Shivakumar and H. Garcia-Molina. Building a scalable and accurate copy detection mechanism. In Proceedings of 1stACMConference on Digital Libraries, Bethesda, Maryland, U.S.A., March 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Trinity College Dublin, Dublin 2, Ireland
Christian D. Jensen

Authors

Christian D. Jensen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of EECS, University of Wisconsin-Milwaukee, Milwaukee, WI, 53201, USA
George I. Davida
Techtegrity, LLC, 122 Harrison, Westfield, NJ, 07090, USA
Yair Frankel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jensen, C.D. (2001). Fingerprinting Text in Logical Markup Languages. In: Davida, G.I., Frankel, Y. (eds) Information Security. ISC 2001. Lecture Notes in Computer Science, vol 2200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45439-X_30

Download citation

DOI: https://doi.org/10.1007/3-540-45439-X_30
Published: 11 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42662-2
Online ISBN: 978-3-540-45439-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics