Using Webspaces to Model Document Collections on the Web

  • Roelof Van Zwol
  • Peter M. G. Apers
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1921)


Due to the unstructured character of data on the web it is hard to find specific information when surfing over the web. Search en- gines can only rely their results on IR techniques available, and most of the time they lack the desired power in query formulation. Modelling data on the web, as if it was designed for use within databases, provides us with the necessary basis for enhancing the query formulation. This requires special care for dealing with the included multimedia data and the semi-structured aspects of the data on the web. Modelling the ent- ire web would be too ambitious, therefore we focus on a more feasible environment, like the intranet, where one can find large collections of related data. This article describes the webspace method for modelling the content of a collection of a domain specific documents, and offers a solution for the above mentioned problems


Semantical Level Query Formulation Document Level Object Server Semistructured Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    M. Agosti, R. Colotti, and G. Gradenigo. A two-level hypertext retrieval model for legal data. In A. Bookstein, Y. Chiaramella, G. Salton, and V.V. Raghavan, editors, Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 316–325, Chicago, Illinois, October 1991. ACM Press.Google Scholar
  2. 2.
    P. A. Boncz, A. N. Wilschut, and M. L. Kersten. Flattening an Object Algebra to Provide Performance. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), pages 568–577, Orlando, FL, USA, February 1998.Google Scholar
  3. 3.
    V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured docu-ments to novel query facilities. In proceedings of SIGMOD94, 1994.Google Scholar
  4. 4.
    A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with STO-RED. In proceedings of the ACM SIGMOD International Conference on Manage-ment of Data, 1999.Google Scholar
  5. 5.
    A. Deutsch, M. F. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query lan-guage for XML. In proceedings of the International World Wide Web Conference (WWW), pages 1155–1169, 1999.Google Scholar
  6. 6.
    M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. Catching the boat with Strudel: Experiences with a web-site management system. In proceedings of ACM SIGMOD Conference on Management of Data, Seattle, WA, 1997.Google Scholar
  7. 7.
    D. Florescu and D. Kossmann. A performance evaluation of alternative mapping schemes for storing XML data in a relational database. Technical report, INRIA, Rocquencourt, May 1999.Google Scholar
  8. 8.
    D. Florescu, I. Manolescu, and D. Kossmann. Integrating keyword search into xml query processing. In proceedings of the ninth international WWW Conference, Amsterdam, The Netherlands, May 2000.Google Scholar
  9. 9.
    R. Goldman, J. McHugh, and J. Widom. From semistructured data to xml: Migra-ting the lore data model and query language. In proceedings of the 2nd International Workshop on the Web and Databases (WebDB '99), Philadelphia, Pennsylvania, June 1999.Google Scholar
  10. 10.
    G. Mecca, P. Merialdo, and P. Atzeni. Araneus in the era of xml. IEEE Data Engineering Bullettin, Special Issue on XML, September 1999.Google Scholar
  11. 11.
    G. Mecca, P. Merialdo, P. Atzeni, and V. Crescenzi. The Araneus guide to web-site development. Technical report, Dipartimento di Informatica e Automazione, Universita' di Roma Tre, March 1999.Google Scholar
  12. 12.
    A. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. Journal of Digital Libraries, pages 1(1):54–67, April 1997.Google Scholar
  13. 13.
    A.R. Schmidt, M.L. Kersten, M.A. Windhouwer, and F. Waas. Efficient relational storage and retrieval of xml documents. In International Workshop on the Web and Databases, Dallas TX, USA, May 2000.Google Scholar
  14. 14.
    M. Stonebraker and G. Kemnitz. The POSTGRES next generation database ma-nagement system. Commun. ACM 34,(10):pages 78–92, October 1991.CrossRefGoogle Scholar
  15. 15.
    D. Suciu. Overview of semi structured data, pages 28–38. Number 4. SIGACT News, December 1998.Google Scholar
  16. 16.
    A.P. de Vries. Content and multimedia database management systems. PhD thesis, University of Twente, Enschede, The Netherlands, December 1999.Google Scholar
  17. 17.
    A.P. de Vries and A.N. Wilschut. On the integration of ir and databases. In Database issues in multimedia; short paper proceedings, international conference on database semantics (DS-8), 1999.Google Scholar
  18. 18.
    R. van Zwol and P.M.G. Apers. Searching documents on the intranet. In proceedings of Workshop on Organizing Webspace, Berkeley (CA), USA, August 1999.Google Scholar
  19. 19.
    R. van Zwol and P.M.G. Apers. Modelling the webspace of an intranet. In proceeding of 1st international conference on Web Information Systems Engineering (WISE00), Hong Kong, June 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Roelof Van Zwol
    • 1
  • Peter M. G. Apers
    • 1
  1. 1.Department of Computer ScienceUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations