Skip to main content

Austrian Online Archive Processing: Analyzing Archives of the World Wide Web

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Included in the following conference series:

Abstract

With the popularity of the World Wide Web and the recognition of its worthiness of being archived we find numerous projects aiming at creating large-scale repositories containing excerpts and snapshots of Web data. Interfaces are being created that allow users to surf through time, analyzing the evolution of Web pages, or retrieving information using search interfaces. Yet, with the timeline and metadata available in such a Web archive, additional analyzes that go beyond mere information exploration, become possible. In this paper we present the AOLAP project building a Data Warehouse of such a Web archive, allowing its analysis and exploration from different points of view using OLAP technologies. Specifically, technological aspects such as operating systems and Web servers used, geographic location, and Web technology such as the use of file types, forms or scripting languages, may be used to infer e.g. technology maturation or impact.

Part of this work was done while the author was an ERCIM Research Fellow at IEI, Consiglio Nazionale delle Ricerche (CNR), Pisa, Italy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Arvidson, K. Persson, and J. Mannerheim. The Kulturarw3 project—The Royal Swedish Web Archiw3e—An example of “complete” collection of web pages. In Proceedings of the 66th IFLA Council and General Conference, Jerusalem, Israel, August 13–18 2000. http://www.ifla.org/IV/ifla66/papers/154-157e.htm.

  2. S. Bhowmick, N. Keong, and S. Madria. Web schemas in WHOWEDA. In Proceedings of the ACM 3rd International Workshop on Data Warehousing and OLAP, Washington, DC, November 10 2000. ACM.

    Google Scholar 

  3. R. Bruckner and A. Tjoa. Managing time consistency for active data warehouse environments. In Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2001), LNCS 2114, pages 254–263, Munich, Germany, September 2001. Springer. http://link.springer.de/link/service/series/0558/papers/2114/21140219.pdf.

    Google Scholar 

  4. Computer Knowledge (CKNOW). FILExt: The file extension source. Webpage, June 2002. http://filext.com.

  5. A. Crespo and H. Garcia-Molin. Cost-driven design for archival repositories. In E. Fox and C. Borgman, editors, Proceedings of the First ACM/IEEE Joint Conference on Digital Libraries (JCDL’01), pages 363–372, Roanoke, VA, June 24–28 2001. ACM. http://www.acm.org/dl.

  6. M. Day. Metadata for digital preservation: A review of recent developments. In Proceedings of the 5. European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001), Springer Lecture Notes in Computer Science, Darmstadt, Germany, Sept. 4–8 2001. Springer.

    Google Scholar 

  7. J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the 26th International Conference on Very Large Databases, VLDB 2000, pages 545–556, Cairo, Egypt, September 10–14 2000.

    Google Scholar 

  8. J. Hakala. Collecting and preserving the web: Developing and testing the NEDLIB harvester. RLG DigiNews, 5(2), April 15 2001. http://www.rlg.org/preserv/diginews/diginews5-2.html.

  9. J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. Webbase: A repositoru of web pages. In Proceedings of the 9th International World Wide Web Conference (WWW9), Amsterdam, The Netherlands, May 15–19 2000. Elsevir Science. http://www9.org/w9cdrom/296/296.html.

  10. The Internet Archive. Website. http://www.archive.org.

  11. B. Kahle. Preserving the internet. Scientific American, March 1997. http://www.sciam.com/0397issue/0397kahle.html.

  12. R. Kimball. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, 2 edition, 2002.

    Google Scholar 

  13. S. Leung, S. Perl, R. Stata, and J. Wiener. Towards web-scale web archeology. Research Report 174, Compaq Systems Research Center, Palo Alto, CA, September 10 2001. http://gatekeeper.dec.com/pub/DEC/SRC/research-reports/SRC-174.pdf.

  14. T. Pedersen and C. Jensen. Multidimensional database technology. IEEE Computer, 34(12):40–46, December 2001.

    Google Scholar 

  15. A. Rauber. Austrian on-line archive: Current status and next steps. Presentation given at the ECDL Workshop on Digital Deposit Libraries (ECDL 2001) Darmstadt, Germany, September 8 2001.

    Google Scholar 

  16. A. Rauber and A. Aschenbrenner. Part of our culture is born digital-On efforts to preserve it for future generations. TRANS. On-line Journal for Cultural Studies (Internet-Zeitschrift für Kulturwissenschaften), 10, July 2001. http://www.inst.at/trans/10Nr/inhalt10.htm.

  17. T. Werf-Davelaar. Long-term preservation of electronic publications: The NEDLIB project. D-Lib Magazine, 5(9), September 1999. http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rauber, A., Aschenbrenner, A., Witvoet, O. (2002). Austrian Online Archive Processing: Analyzing Archives of the World Wide Web. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-45747-X_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44178-6

  • Online ISBN: 978-3-540-45747-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics