A Conceptual Model for the Web

  • Mengchi Liu
  • Tok Wang Ling
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1920)


Most documents available over the web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing graph-based or tree-based data models for the web only provide a very low level representation of such hierarchical structure. In this paper, we introduce a conceptual model for the web that is able to represent the complex hierarchical structure within the web documents at a high level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, we can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way.


Conceptual Model Input Type Semistructured Data List Object Radio Button 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    T. Bray, J. Paoli, and C.M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation. See, February 1998.
  2. 2.
    P. Buneman, S. Davidson, G. Hilebrand, and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In Proceedings of the ACM SIG-MOD International Conference on Management of Data, pages 505–516, 1996.Google Scholar
  3. 3.
    J. Clark and S. DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation. See, November 1999.
  4. 4.
    M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A Query Language for a Web-Site Management System. SIGMOD Record, pages 4–11, 1997.Google Scholar
  5. 5.
    M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Reasoning About Web-Site Structure. In Proceedings of AAAI’98 Workshop on AI and Information Integration, 1998.Google Scholar
  6. 6.
    D. Florescu, A. Levy, and A. Mendelzon. Database Techniques for the World-Wide Web: A Survey. SIGMOD Record, 27(3):59–74, 1998.CrossRefGoogle Scholar
  7. 7.
    J. Hammer, H. Garcia-Molina, J. Cho, A. Crespo, and R. Aranha. Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data, 1997.Google Scholar
  8. 8.
    C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, P. J. Modi, I. Muslea, A. G. Philpot, and S. Tejada. Modeling Web Sources for Information Integration. In Proceedings of the 15th National Conference on AI, 1998.Google Scholar
  9. 9.
    M. Liu and T. W. Ling. A Data Model for Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Conference on Ad-vances in Database Technology (EDBT 2000), pages 317–331, Konstanz, Germany, March 27-31 2000. Springer-Verlag LNCS 1777.Google Scholar
  10. 10.
    M. Liu, T. W. Ling, and T. Guan. Integration of Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Database Engineering and Application Symposium (IDEAS’ 99), pages 44–52, Montreal, Canada, August 2-4 1999. IEEE-CS Press.Google Scholar
  11. 11.
    I. Muslea, S. Minton, and C. A. Knoblock. Hierarchical Wrapper Induction for Semistructured Information Sources. To appear in Journal of Autonomous Agents and Multi-Agent Systems.Google Scholar
  12. 12.
    Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object Exchange across Heterogeneous Information. In Proceedings of the International Conference on Data Engineering, pages 251–260. IEEE Computer Society, 1995.Google Scholar
  13. 13.
    D. Raggett, A. L. Hors, and I. Jacobs. HTML 4.01 Specification. W3C Recommendation. See, December 1999.
  14. 14.
    L. Wood, A. L. Hors, et al. Document Object Model (DOM) Level 2 Specification. W3C Recommendation. See, March 2000.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Mengchi Liu
    • 1
  • Tok Wang Ling
    • 2
  1. 1.Department of Computer ScienceUniversity of ReginaRegina, SaskatchewanCanada
  2. 2.School of ComputingNational University of SingaporeSingapore

Personalised recommendations