Synonyms
Glossary
- Crawler:
-
A software that harvests content from the World Wide Web
- URL:
-
Uniform Resource Locator
- Scope:
-
The resources that a web archive seeks to preserve
- Website:
-
A collection of related URLs
Definition
Web archives are repositories of web contents collected in the past. They act against the ephemeral nature of the World Wide Web, where new contents are constantly added while others are removed and thus lost forever. Web archives counter this loss by preserving web contents as part of the cultural heritage for future generations. To this end, web archives select resources (e.g., specific websites) worth preserving, repeatedly acquire snapshots of these resources, store them together with metadata (e.g., a time stamp or keywords), and provide access to the archived web contents (e.g., via keyword search). Institutions operating web archives include nonprofit organizations, universities, national libraries, and for-profit companies. Users...
Keywords
- International Internet Preservation Consortium (IIPC)
- Original Content Provider
- Fetch Resources
- United Nations Educational, Scientific And Cultural Organization (UNESCO)
- HTTP Communication
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
References
Archive-It (2013) http://www.archive-it.org. Last access 25 Apr 2013
Archive The Net (2013) http://archivethe.net. Last access 25 Apr 2013
Arvidson A, Lettenstrom F (1998) The Kulturarw project – the Swedish royal web archive. Electron Libr 16(2):105–108
Berberich K, Bedathur S, Neumann T, Weikum G (2007) A time machine for text search. In: SIGIR’07: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, pp 519–526
Denev D, Mazeika A, Spaniol M, Weikum G (2011) The SHARC framework for data quality in web archiving. VLDB J 20(2):183–207
Dougherty M, Meyer ET, Madsen C, van den Heuvel C, Thomas A, Wyatt S (2010) Researcher engagement with web archives – state of the art. JISC project report. http://ssrn.com/abstract=1714997. Last access 25 Apr 2013
Gomes D, Miranda JA, Costa M (2011) A survey on web archiving initiatives. In: Proceedings of the 15th international conference on theory and practice of digital libraries: research and advanced technology for digital libraries, TPDL’11, Berlin, pp 408–420
Heritrix (2013) http://crawler.archive.org. Last access 25 Apr 2013
International Internet Preservation Consortium (2013) http://www.netpreserve.org. Last access 25 Apr 2013
International Web Archiving Workshop (2013) http://www.iwaw.net. Last access 25 Apr 2013
Internet Archive (2013) http://www.archive.org. Last access 25 Apr 2013
Internet Memory Foundation (2013) http://www.internetmemory.org. Last access 25 Apr 2013
Kahle B (1997) Preserving the internet. Scientific American, New York
Library of Congress (2013) http://www.loc.gov. Last access 25 Apr 2013
Madhavan J, Ko D, Kot L, Ganapathy V, Rasmussen A, Halevy AY (2008) Google’s deep web crawl. PVLDB 1(2):1241–1252
Masanes J (2006) Web archiving. Springer, Heidelberg
Meyer ET, Thomas A, Schroeder R (2011) Web archives: the future(s). Oxford internet institute technical report. http://ssrn.com/abstract=1830025. Last access 25 Apr 2013
National Library of France (2013) http://www.bnf.fr. Last access 25 Apr 2013
Niu J (2012a) Functionalities of web archives. D-Lib Mag 18(5/6). http://www.dlib.org/dlib/march12/niu/03niu2.html
Niu J (2012b) An overview of web archiving. D-Lib Mag 18(5/6). http://www.dlib.org/dlib/march12/niu/03niu1.html
Olston C, Najork M (2010) Web crawling. Found Trends Inf Retr 4(3):175–246
Portuguese Web Archive (2013) http://www.arquivo.pt. Last access 25 Apr 2013
Stanford WebBase Project (2013) http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase. Last access 25 Apr 2013
Thomas A, Meyer ET, Dougherty M, den Heuvel CV, Madsen CM, Wyatt S (2010) Researcher engagement with web archives: challenges and opportunities for investment. http://ssrn.com/abstract=1715000. Last access 25 Apr 2013
Toyoda M, Kitsuregawa M (2012) The history of web archiving. Proc IEEE 100:1441–1443
UNESCO (2003) Charter on the preservation of digital heritage. http://portal.unesco.org/ci/en/files/13367/10700115911Charter_en.pdf/Charter_en.pdf. Last access 25 Apr 2013
Recommended Reading
Masanès (2006) remains the key reference on web archives. While web technology has evolved since its publication, the majority of issues discussed therein are still current. More recent accounts on the state of the art in web archiving can be found in Niu (2012b) and Dougherty et al. (2010). Meyer et al. (2011), as a final recommendation, give a glimpse of web archives’ future
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this entry
Cite this entry
Berberich, K. (2017). Web Archives. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7163-9_128-1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7163-9_128-1
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7163-9
Online ISBN: 978-1-4614-7163-9
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering