Skip to main content

On Warehousing Historical Web Information

  • Conference paper
  • First Online:
Book cover Conceptual Modeling — ER 2000 (ER 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1920))

Included in the following conference series:

Abstract

We present a temporal web data model designed for wa- rehousing historical data from World Wide Web that changes with time. As the Web is now populated with large volume of web information, it has become necessary to capture some useful web information in a data warehouse that supports further intelligent data analysis. Nevertheless, due to the unstructured and dynamic nature of Web, the traditional relational model and its temporal variants could not be used to build such a data warehouse. In this paper, we therefore propose a temporal web data model that captures the connectivities of web documents and their content in the form of temporal web tables. To support the analysis of web data that evolve with time, valid time intervals are associated with each web document. To manipulate temporal web tables, we define a variety of web operators and illustrate their usefulness using some realistic motivating examples.

This work was supported in part by the Nanyang Technological University, Ministry of Education (Singapore) under Academic Research Fund #4-12034-5060, #4-12034- 3012, #4-12034-6022. Any opinions,findings, and recommendations in this paper are those of the authors and do not reflect the views of the funding agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68–88, April 1997.

    Article  Google Scholar 

  2. G. Arocena and A. Mendelzon. WebOQL: Restructuring documents, databases and webs. In Proceedings of ICDE’98, Orlando, Florida, February 1998.

    Google Scholar 

  3. G. Arocena, A. Mendelzon, and G. Mihaila. Applications of a web query language. In Proceedings of the 6th International WWW Conference, Santa Clara, April 1997.

    Google Scholar 

  4. P. Atzeni, G. Mecca, and P. Merialdo. To weave the web. In Proceedings of the 23rd VLDB Conference, Athens, Greece, 1997.

    Google Scholar 

  5. P. Buneman, S. Davidson, and G. Hillebrand. A querying language and optimization techniques for unstructured data. In Proceedings of ACM SIGMOD Conference on Management of Data, pages 505–516, Montreal, Canada, 1996.

    Google Scholar 

  6. J. Clifford and A. Croker. The historical relational data model (HRDM) and algebra based on lifespans. In Proceedings of the International Conference on Data Engineering, pages 528–537. IEEE Computer Society, February 1987.

    Google Scholar 

  7. M. Fernandez, D. Florescu, J. Kang, and A. Levy. Catching the boat with Strudel: Experiences with a web-site management system. In Proceedings of ACM SIGMOD Conference on Management of Data, Seattle, WA, 1998.

    Google Scholar 

  8. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, and T. Berners-Lee. Hypertext Transfer Protocol HTTP/1.1, Jan 199

    Google Scholar 

  9. D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the world-wide web: A survey. ACM SIGMOD Record, 27(3):59–74, September 1998.

    Google Scholar 

  10. R. Himmeroder, G. Lausen, B. Ludascher, and C. Schlepphorst. On a declarative semantics for web queries. In Proceedings of the 5th International Conference on Deductive and Object-Oriented Databases, Montreux, Switzerland, December 1997.

    Google Scholar 

  11. D. Konopnicki and O. Shmueli. W3QS: A query system for the world wide web. In Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 1995.

    Google Scholar 

  12. L. V. S. Lakshmanan, F. Sadri, and L. N. Subramanian. A declarative language for querying and restructuring the web. In Proceedings of the 6th International Workshop on Research Issues in Data Engineering, RIDE’ 96, New Orleans, February 1996.

    Google Scholar 

  13. A. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. International Journal on Digital Libraries, 1(1):54–67, April 1997.

    Google Scholar 

  14. S.B. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49(1–3):147–175, 1989.

    Article  MATH  Google Scholar 

  15. W.-K. Ng, E.-P. Lim, C.-T Huang, S.S. Bhowmick, and F.-Q. Qin. Web warehousing: An algebra for web information. In Proceedings of IEEE International Conference on Advances in Digital Libraries (ADL’98), April 1998.

    Google Scholar 

  16. Richard Snodgrass. The temporal query language TQuel. ACM Transactions on Database Systems, 12(2):247–298, June 1987.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cao, Y., Lim, EP., Ng, WK. (2000). On Warehousing Historical Web Information. In: Laender, A.H.F., Liddle, S.W., Storey, V.C. (eds) Conceptual Modeling — ER 2000. ER 2000. Lecture Notes in Computer Science, vol 1920. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45393-8_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-45393-8_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41072-0

  • Online ISBN: 978-3-540-45393-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics