Abstract
Information hiding is attracting an increasing attention from the research community. Most of this research has centered around hiding information, such as watermarks and fingerprints, in images or digital audio and video signals. Text has generally been treated as a black & white image with special properties. All of the current methods of hiding information in text are vulnerable to scanning followed by optical character recognition in order to reconstruct the text.
Document distribution is increasingly relying on logical markup languages like HTML and XML, where the physical presentation of the text is determined by the user’s browser. Embedding the watermark in the physical presentation of the document is therefore no longer practical. We argue that embedding syntactic or semantic fingerprints in text is the only viable way to fingerprint document in logical markup languages such as HTML or XML.
In this paper, we propose a new semantic fingerprinting mechanism based on synonymsubstitution. This idea is developed into an operational system and results of preliminary experiments are reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W. Bender, d. Gruhl, N. Morimoto, and A. Lu. Techniques for data hiding. IBM Systems Journal, 35(3&4), 1996.
J. T. Brassil, S. H. Low, N. F. Maxemchuk, and L. O’Gorman. Electronic marking and identification techniques to discourage document copying. In Proceedings of Infocom, pages 1278–1287, Toronto, Ontario, Canada, June 1994.
T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler. Extensible Markup Language (XML) 1.0 (Second Edition). 2000.
S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms for digital documents. In Proceedings of the ACM SIGMOD Annual Conference, San Francisco, California, U.S.A., May 1995.
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. Syntactic clustering of the web. SRC Technical Note 1997-015, DEC Systems Research Center (now COMPAQ), July 1997.
B. Chor, A. Fiat, and M. Naor. Tracing traitors. In Advances in Cryptology-CRYPTO ’94, volume 839 of Lecture Notes in Computer Science, pages 257–270. Springer Verlag, 1994.
C. S. Collberg and C. Thomborson. Software watermarking: Models and dynamic embeddings. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, Texas, U.S.A., January 1999.
J. H. Coombs, A. H. Renear, and S. J. DeRose. Markup systems and the future of scholarly text processing. Communications of the ACM, 30(11):933–947, November 1987.
S. J. DeRose, D. G. Durand, E. Mylonas, and A. H. Renear. What is text, really? Journal of Computing in Higher Education, 1(2):3–26, 1990.
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, 1998.
Compris.com GmbH. Hide arbitrary data in texts through automatic text rephrasing with texthide! Available from: http://www.TextHide.com/ .
Compris.com GmbH. Textsign-protect your texts with digital watermarks! Available from: http://www.TextSign.com/ .
Project Gutenberg. Roget’s Thesaurus of EnglishWords and Phrases. The Crowell Company, 1911.
N. Heintze. Scalable document fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Oakland, Ca, U.S.A., 1996.
S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking and detection methods. IEEE Journal on Selected Areas in Communications, 16(4):561–572, May 1998.
S. H. Low and N. F. Maxemchuk. Capacity of text marking channel. IEEE Signal Processing Letters, December 2000. To appear.
S. H. Low, N. F. Maxemchuk, and A. Lapone. Document identification for copyright protection using centroid detection. IEEE Transactions on Communications, 46(3):372–383, March 1998.
N. F. Maxemchuk and S. H. Low. Marking text documents. In Proceedings of International Conference on Image Processing, Santa Barbara, California, U.S.A., 1997.
F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn. Information hiding — a survey. Proceedings of the IEEE, Special issue on protection of multimedia content, 87(7):1062–1078, July 1999.
D. Raggett, A. Le Hors, and I. Jacobs. HTML 4.01 Specification. 1999.
V. Raskin, M. Atallah, C. McDonough, and S. Nirenburg. Natural language processing for information assurance and security an overview and implementation. In Proceedings of the New Security Paradigms Workshop, pages 51–65, Ballycotton, Ireland, September 2000.
N. Shivakumar and H. Garcia-Molina. Scam: A copy detection mechanism for digital documents. In Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries, Austin, Texas, U.S.A., June 1995.
N. Shivakumar and H. Garcia-Molina. Building a scalable and accurate copy detection mechanism. In Proceedings of 1stACMConference on Digital Libraries, Bethesda, Maryland, U.S.A., March 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jensen, C.D. (2001). Fingerprinting Text in Logical Markup Languages. In: Davida, G.I., Frankel, Y. (eds) Information Security. ISC 2001. Lecture Notes in Computer Science, vol 2200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45439-X_30
Download citation
DOI: https://doi.org/10.1007/3-540-45439-X_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42662-2
Online ISBN: 978-3-540-45439-7
eBook Packages: Springer Book Archive