Skip to main content

Towards Logical Hypertext Structure

A Graph-Theoretic Perspective

  • Conference paper
Innovative Internet Community Systems (IICS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3473))

Included in the following conference series:

Abstract

Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graph-theory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adamic, L.A.: The small world of web. In: Abiteboul, S., Vercoustre, A.-M. (eds.) Research and Advanced Technology for Digital Libraries, pp. 443–452. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  2. Agosti, M., Smeaton, A.F.: Information Retrieval and Hypertext. Kluwer, Boston (1996)

    Google Scholar 

  3. Allan, J.: Automatic hypertext link typing. In: Proceedings of the 7th ACM Conference on Hypertext, pp. 42–52. ACM, New York (1996)

    Chapter  Google Scholar 

  4. Amitay, E., Carmel, D., Darlow, A., Lempel, R., Soffer, A.: The connectivity sonar: detecting site functionality by structural patterns. In: Proc. of the 14th ACM conference on Hypertext and Hypermedia, pp. 38–47 (2003)

    Google Scholar 

  5. Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland, Amsterdam (1989)

    MATH  Google Scholar 

  6. Botafogo, R.A., Rivlin, E., Shneiderman, B.: Structural analysis of hypertexts: Identifying hierarchies and useful metrics. ACM Transactions on Information Systems 10(2), 142–180 (1992)

    Article  Google Scholar 

  7. Chakrabarti, S.: Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In: Proc. of the 10th International World Wide Web Conference, Hong Kong, May 1-5, pp. 211–220 (2001)

    Google Scholar 

  8. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Haas, L., Tiwary, A. (eds.) Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 307–318. ACM, New York (1998)

    Google Scholar 

  9. Chakrabarti, S., Joshi, M., Punera, K., Pennock, D.M.: The structure of broad topics on the web. In: Proc. of the 11th Internat. World Wide Web Conference, pp. 251–262. ACM Press, New York (2002)

    Chapter  Google Scholar 

  10. Eiron, N., McCurley, K.S.: Untangling compound documents on the web. In: Proceedings of the 14th ACM conference on Hypertext and hypermedia, Nottingham, UK, pp. 85–94 (2003)

    Google Scholar 

  11. Fürnkranz, J.: Using links for classifying web-pages. Technical report, TR-OEFAI- 98-29 (1998)

    Google Scholar 

  12. Furner, J., Ellis, D., Willett, P.: The representation and comparison of hypertext structures using graphs. In: Agosti, M., Smeaton, A.F. (eds.) Information Retrieval and Hypertext, pp. 75–96. Kluwer, Boston (1996)

    Google Scholar 

  13. Halasz, F., Schwartz, M.: The Dexter hypertext reference model. Communications of the ACM 37(2), 30–39 (1994)

    Article  Google Scholar 

  14. Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to SVM classification. Technical report, Department of Computer Science and Information Technology, National Taiwan University (2003)

    Google Scholar 

  15. Joachims, T.: Learning to classify text using support vector machines. Kluwer, Boston (2002)

    Google Scholar 

  16. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  17. Kuhlen, R.: Hypertext: ein nichtlineares Medium zwischen Buch und Wissensbank. Springer, Heidelberg (1991)

    Google Scholar 

  18. Li, M., Chen, X., Xin, L., Ma, B., Vitányi, P.M.: The similarity metric. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 863–872. ACM Press, New York (2003)

    Google Scholar 

  19. Li, W.-S., Kolak, O., Vu, Q., Takano, H.: Defining logical domains in a web site. In: Proc. of the 11th ACM on Hypertext and Hypermedia, pp. 123–132 (2000)

    Google Scholar 

  20. Mizuuchi, Y., Tajima, K.: Finding context paths for web pages. In: Proceedings of the 10th ACM Conference on Hypertext and Hypermedia, pp. 13–22 (1999)

    Google Scholar 

  21. Mukherjea, S., Hara, Y.: Focus+context views of world-wide web nodes. In: Proceedings of the eighth ACM conference on Hypertext, pp. 187–196 (1997)

    Google Scholar 

  22. Pirolli, P., Pitkow, J., Rao, R.: Silk from a sow’s ear: Extracting usable structures from the web. In: Proc. of the ACM SIGCHI Conference on Human Factors in Computing, pp. 118–125. ACM Press, New York (1996)

    Google Scholar 

  23. Power, R., Scott, D., Bouayad-Agha, N.: Document structure. Computational Linguistics 29(2), 211–260 (2003)

    Article  Google Scholar 

  24. Rehm, G.: Towards automatic web genre identification – a corpus-based approach in the domain of academia by example of the academic’s personal homepage. In: Proc. of the Hawai’i Internat. Conf. on System Sciences, January 7-10 (2002)

    Google Scholar 

  25. Renear, A.: Out of praxis: Three (meta)theories of textuality. In: Sutherland, K. (ed.) Electronic Text. Investigations in Method and Theory, pp. 107–126. Clarendon Press, Oxford (1997)

    Google Scholar 

  26. Routledge, L., Bailey, B., van Ossenbruggen, J., Hardman, L., Geurts, J.: Generating presentation constraints from rhetorical structure. In: Proceedings of the 11th ACM Conference on Hypertext and Hypermedia, pp. 19–28. ACM, New York (2000)

    Google Scholar 

  27. Spertus, E.: ParaSite: mining structural information on the web. In: Selected papers from the sixth international conference on World Wide Web, pp. 1205–1215. Elsevier, Amsterdam (1997)

    Google Scholar 

  28. Tajima, K., Tanaka, K.: New techniques for the discovery of logical documents in web. In: Internat. Symposium on Database Applications in Non-Traditional Environments, pp. 125–132. IEEE, Los Alamitos (1999)

    Google Scholar 

  29. Thüring, M., Hannemann, J., Haake, J.M.: Hypermedia and cognition: Designing for comprehension. Communications of the ACM 38(8), 57–66 (1995)

    Article  Google Scholar 

  30. Winter, A., Kullbach, B., Riedinger, V.: An overview of the GXL graph exchange language. In: Diehl, S. (ed.) Software Visualization, pp. 324–336. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  31. Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Intelligent Information Systems 18(2-3), 219–241 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mehler, A., Dehmer, M., Gleim, R. (2006). Towards Logical Hypertext Structure. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds) Innovative Internet Community Systems. IICS 2004. Lecture Notes in Computer Science, vol 3473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553762_14

Download citation

  • DOI: https://doi.org/10.1007/11553762_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28880-0

  • Online ISBN: 978-3-540-33995-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics