Abstract
We have integrated Web ARChive (WARC) files with the peer-to-peer content addressable InterPlanetary File System (IPFS) to allow the payload content of web archives to be easily propagated. We also provide an archival replay system extended from pywb to fetch the WARC content from IPFS and re-assemble the originally archived HTTP responses for replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we show that extracting and indexing the HTTP response content of WARCs containing IPFS lookup hashes takes 66.6 min inclusive of dissemination into IPFS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alam, S.: CDXJ: an object resource stream serialization format, September 2015. http://ws-dl.blogspot.com/2015/09/2015-09-10-cdxj-object-resource-stream.html
Benet, J.: IPFS - content addressed, version, P2P file system. Technical report, July 2014. arXiv:1407.3561
Fielding, R., Reschke, J.: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. IETF RFC 7230, June 2014
ISO 28500. WARC (Web ARChive) file format, August 2009. http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
Maniatis, P., Roussopoulos, M., Giuli, T.J., Rosenthal, D.S.H., Baker, M.: The LOCKSS peer-to-peer digital preservation system. ACM Trans. Comput. Syst. 23(1), 2–50 (2005)
Moats, R.: URN Syntax. IETF RFC 2141, May 1997
Mohr, G., Kimpton, M., Stack, M., Ranitovic, I.: Introduction to Heritrix, an archival quality web crawler. In: Proceedings of the 4th International Web Archiving Workshop (IWAW 2004), September 2004
Sigurðsson, K.: Managing duplicates across sequential crawls. In: Proceedings of the 6th International Web Archiving Workshop (IWAW 2006), September 2006
Van de Sompel, H., Nelson, M., Sanderson, R.: HTTP Framework for Time-Based Access to Resource States - Memento. IETF RFC 7089, December 2013
Acknowledgements
We would like to thank Ilya Kreymer for his feedback during the development of the ipwb prototype and guidance in interfacing with the pywb replay system. This work was supported in part by NSF award 1624067 via the Archives Unleashed HackathonFootnote 6, where we developed the prototype.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kelly, M., Alam, S., Nelson, M.L., Weigle, M.C. (2016). InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-43997-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)