Skip to main content
Log in

Using the web infrastructure to preserve web pages

  • REGULAR PAPER
  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the “living web” and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital preservation. However, due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what is not saved. We provide an overview of our ongoing research projects that focus on using the “web infrastructure” to provide preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval. The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they focus on providing at least some form of archival capability for the mass of web pages that may prove to have value in the future. We characterize the preservation approaches by the level of effort required by the web administrator: web sites are reconstructed from the caches of search engines (“lazy preservation”); lexical signatures are used to find the same or similar pages elsewhere on the web (“just-in-time preservation”); resources are pushed to other sites using NNTP newsgroups and SMTP email attachments (“shared infrastructure preservation”); and an Apache module is used to provide OAI-PMH access to MPEG-21 DIDL representations of web pages (“web server enhanced preservation”).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. GNU wget GNU Project Free Software Foundation (FSF). URL: http://www.gnu.org/software/wget/wget.html

  2. Abiteboul, S., Cobena, G., Masanes, J., Sedrati, G.: A first experience in archiving the French web. In: ECDL ’02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 1–15 (2002)

  3. Arms, W.Y., Aya, S., Dmitriev, P., Kot, B.J., Mitchell, R., Walle, L.: Building a research library for the history of the web. In: JCDL ’06: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 95–102. doi:10.1145/1141753.1141771 (2006)

  4. Baeza-Yates, R., Castillo, C.: Crawling the infinite web: five levels are enough. In: Proceedings of the Third Workshop on Web Graphs (WAW), vol. 3243, pp. 156–167 (2004)

  5. Bar-Yossef, Z., Broder, A.Z., Kumar, R., Tomkins, A.: Sic transit gloria telae: towards an understanding of the web’s decay. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 328–337. doi:10.1145/988672.988716 (2004)

  6. Beck, M., Moore, T., Plank, J.S.: An end-to-end approach to globally scalable network storage. In: SIGCOMM ’02: Proceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 339–346. doi:10.1145/633025.633058 (2002)

  7. Bekaert J., De Kooning E. and Vande Sompel H. (2006). Representing digital objects using MPEG-21 Digital Item Declaration. Int. J. Digital Libraries 6(2): 159–173. doi:10.1007/s00799-005-0133-0

    Article  Google Scholar 

  8. Bekaert, J., Hochstenbach, P., Van de Sompel, H.: Using MPEG-21 DIDL to represent complex digital objects in the Los Alamos National Laboratory digital library. D-Lib Magaz. 9(11) (2003). doi:10.1045/november2003-bekaert

  9. Bekaert, J., Liu, X., Van de Sompel, H.: aDORe: a modular and standards-based digital object repository at the Los Alamos National Laboratory. In: JCDL ’05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, p. 367. doi:10.1145/1065385.1065470 (2005)

  10. Bergman, M.K.: The deep web: surfacing hidden value. J. Electron. Publishing 7(1) (2001). URL: http://www.press.umich.edu/ jep/07-01/bergman.html

  11. Bergmark, D., Lagoze, C., Sbityakov, A.: Focused crawls, tunneling, and digital libraries. In: ECDL ’02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 91–106 (2002)

  12. Berners-Lee, T.: Cool URIs don’t change (1998). http://www.w3. org/Provider/Style/URI.html

  13. Bharat, K., Broder, A.: Mirror, mirror on the web: a study of host pairs with replicated content. In: Proceedings of WWW ’99, pp. 1579–1590. doi:10.1016/S1389-1286(99)00021-3 (1999)

  14. Brandman O., Cho J., Garcia-Molina H. and Shivakumar N. (2000). Crawler-friendly web servers. SIGMETRICS Perform. Eval. Rev 28(2): 9–14. doi:10.1145/362883.362894

    Article  Google Scholar 

  15. Broder, A.Z., Najork, M., Wiener, J.L.: Efficient URL caching for World Wide Web crawling. In: Proceedings of WWW ’03, pp. 679–689. doi:10.1145/775152.775247 (2003)

  16. Chen P.M., Lee E.K., Gibson G.A., Katz R.H. and Patterson D.A. (1994). RAID: high-performance, reliable secondary storage. ACM Comput. Surv. 26(2): 145–185. doi:10.1145/176979.176981

    Article  Google Scholar 

  17. Chen, Y., Edler, J., Goldberg, A., Gottlieb, A., Sobti, S., Yianilos, P.: A prototype implementation of archival intermemory. In: DL ’99: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 28–37. doi:10.1145/313238.313249 (1999)

  18. Cho, J., Garcia-Molina, H.: The evolution of the web and implications for an incremental crawler. In: Proceedings of VLDB ’00, pp. 200–209 (2000)

  19. Cho, J., Garcia-Molina, H.: Parallel crawlers. In: WWW ’02: Proceedings of the 11th International Conference on World Wide Web, pp. 124–135. doi:10.1145/511446.511464 (2002)

  20. Cho J. and Garcia-Molina H. (2003). Effective page refresh policies for web crawlers. ACM Trans. Database Systems (TODS) 28(4): 390–426. doi:10.1145/958942.958945

    Article  Google Scholar 

  21. Cho J. and Garcia-Molina H. (2003). Estimating frequency of change. ACM Trans. Internet Technol. 3(3): 256–290. doi:10.1145/ 857166.857170

    Article  Google Scholar 

  22. Cho J., Garcia-Molina H., Haveliwala T., Lam W., Paepcke A., Raghavan S. and Wesley G. (2006). Stanford Webbase components and applications. ACM Trans. Internet Technol 6(2): 153–186. doi: 10.1145/1149121.1149124

    Article  Google Scholar 

  23. Cho J., Garcia-Molina H. and Page L. (1998). Efficient crawling through url ordering. Comput. Netw. ISDN Systems 30(1–7): 161–172

    Article  Google Scholar 

  24. Cho, J., Shivakumar, N., Garcia-Molina, H.: Finding replicated web collections. In: SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 355–366. doi:10.1145/342009.335429 (2000)

  25. Christensen, N.: Preserving the bits of the Danish Internet. In: 5th International Web Archiving Workshop (IWAW05) (2005). http://www.iwaw.net/05/papers/iwaw05-christensen.pdf

  26. Clarke I., Miller S.G., Hong T.W., Sandberg O. and Wiley B. (2002). Protecting free expression online with Freenet. IEEE Internet Comput. 6(1): 40–49. doi:10.1109/4236.978368

    Article  Google Scholar 

  27. Consultative Committee for Space Data Systems: Reference model for an open archival information system (OAIS). Tech. rep. (2002)

  28. Cooper, B., Crespo, A., Garcia-Molina, H.: Implementing a reliable digital object archive. In: ECDL ’00: Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries, pp. 128–143 (2000)

  29. Cooper B.F. and Garcia-Molina H. (2002). Peer-to-peer data trading to preserve information. ACM Trans. Inf. Systems (TOIS) 20(2): 133–170. doi:10.1145/506309.506310

    Article  Google Scholar 

  30. Cooper B.F. and Garcia-Molina H. (2005). Infomonitor: Unobtrusively archiving a World Wide Web server. Int. J. Digital Libraries 5(2): 106–119

    Article  Google Scholar 

  31. Dabek, F., Kaashoek, M.F., Karger, D., Morris, R., Stoica, I.: Wide-area cooperative storage with CFS. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP ’01) (2001)

  32. Day, M.: Collecting and preserving the World Wide Web (2003). URL: http://library.wellcome.ac.uk/assets/WTL039229.pdf

  33. Dingledine, R., Freedman, M.J., Molnar, D.: The Free Haven project: distributed anonymous storage service. In: International Workshop on Designing Privacy Enhancing Technologies, pp. 67–95 (2001)

  34. Dyreson, C.E., Lin, H., Wang, Y.: Managing versions of web documents in a transaction-time web server. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 422–432 (2004). doi:10.1145/988672.988730

  35. E.G. Coffman J., Liu Z. and Weber R.R. (1998). Optimal robot scheduling for web search engines. J. Scheduling 1(1): 15–29

    Article  MathSciNet  Google Scholar 

  36. Edwards, J., McCurley, K., Tomlin, J.: An adaptive model for optimizing performance of an incremental web crawler. In: WWW ’01: Proceedings of the 10th International Conference on World Wide Web, pp. 106–113 (2001). doi:10.1145/371920.371960

  37. Feise, J.: An approach to persistence of web resources. In: HYPERTEXT ’01: Proceedings of the 12th ACM Conference on Hypertext and Hypermedia, pp. 215–216 (2001). doi:10.1145/504216.504267

  38. Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In: WebDB ’04: Proceedings of the 7th International Workshop on the Web and Databases, pp. 1–6 (2004). doi:10.1145/1017074.1017077

  39. Fetterly, D., Manasse, M., Najork, M., Wiener, J.: A large-scale study of the evolution of web pages. In: WWW ’03: Proceedings of the 12th International Conference on World Wide Web, pp. 669–678 (2003). doi:10.1145/775152.775246

  40. Fielding, R.T.: Architectural styles and the design of network-based software architectures. Ph.D. thesis, University of California, Irvine Department of Computer Science (2000). URL: http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

  41. Gladney H.M. (2004). Trustworthy 100-year digital objects: evidence after every witness is dead. ACM Trans. Inf. Systems (TOIS) 22(3): 406–436. doi:10.1145/1010614.1010617

    Article  Google Scholar 

  42. Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: WWW ’05: Proceedings of the 14th International Conference on World Wide Web, pp. 902–903 (2005). doi:10.1145/1062745.1062789

  43. Gupta, V., Campbell, R.: Internet search engine freshness by web server help. In: SAINT ’01: Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001), pp. 113–119 (2001)

  44. Gutteridge, C., Harnad, S.: Applications, potential problems and a suggested policy for institutional e-print archives. Tech. Rep. 6768, University of Southampton, Intelligence, Agents, Multimedia Systems Group (2002)

  45. Gyongyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with TrustRank. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pp. 271–279 (2004)

  46. Hafri, Y., Djeraba, C.: High performance crawling system. In: MIR ’04: Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information retrieval, pp. 299–306 (2004). doi:10.1145/1026711.1026760

  47. Hammond, T., Hannay, T., Lund, B., Scott, J.: Social bookmarking tools (I): a general review. D-Lib Magaz. 11(4) (2005). doi:10.1045/april2005-hammond

    Google Scholar 

  48. Harrison, T.L.: Opal: In vivo based preservation framework for locating lost web pages. Master’s thesis, Old Dominion University (2005). URL:http://www.cs.odu.edu/~tharriso/thesis/

  49. Harrison, T.L., Nelson, M.L.: Just-in-time recovery of missing web pages. In: HYPERTEXT ’06: Proceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia (2006)

  50. Kahle B. (1997). Preserving the Internet. Sci. Am. 276(3): 82–83

    Article  Google Scholar 

  51. Kantor, B., Lapsley, P.: Network news transfer protocol (1986)

  52. Koehler W. (2002). Web page change and persistence—a four-year longitudinal study. J. Am. Soc. Inf. Sci. Technol. 53(2): 162–171. doi:10.1002/asi.10018

    Article  Google Scholar 

  53. Lagoze, C., Arms, W., Gan, S., Hillmann, D., Ingram, C., Krafft, D., Marisa, R., Phipps, J., Saylor, J., Terrizzi, C., Hoehn, W., Millman, D., Allan, J., Guzman-Lara, S., Kalt, T.: Core services in the architecture of the national science digital library (NSDL). In: JCDL ’02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 201–209 (2002). doi:10.1145/544220.544264

  54. Lagoze, C., Van de Sompel, H.: The Open Archives Initiative: building a low-barrier interoperability framework. In: JCDL ’01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 54–62 (2001). doi:10.1145/379437.379449

  55. Lampos, C., Eirinaki, M., Jevtuchova, D., Vazirgiannis, M.: Archiving the Greek Web. In: 4th International Web Archiving Workshop (IWAW04) (2004)

  56. Lannom, L.: Handle system overview. ICSTI Forum (30) (1999). URL: http://www.icsti.org/forum/30/

  57. Lawrence S., Giles C.L. and Bollacker K. (1999). Digital libraries and autonomous citation indexing. IEEE Comput. 32(6): 67–71. doi:10.1109/2.769447

    Article  Google Scholar 

  58. Lawrence S., Pennock D.M., Flake G.W., Krovetz R., Etzee F.M.C., Glover E., Nielsen F., Kruger A. and Giles C.L. (2001). Persistence of web references in scientific research. IEEE Computer 34(2): 26–31

    Article  Google Scholar 

  59. Maniatis P., Roussopoulos M., Giuli T.J., Rosenthal D.S.H. and Baker M. (2005). The LOCKSS peer-to-peer digital preservation system. ACM Trans. Comput. Systems 23(1): 2–50. doi:10.1145/1047915.1047917

    Article  Google Scholar 

  60. Marcum, D.B.: We can’t save everything. CLIR Issues (5) (1998). http://www.clir.org/pubs/issues/issues05.html

  61. Marill, J.L., Boyko, A., Ashenfelder, M., Graham, L.: Tools and techniques for harvesting the World Wide Web. In: JCDL ’04: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, p. 403 (2004). doi:10.1145/996350.996469

  62. Masanès, J.: Archiving the deep web. In: Proceedings of the 2nd International Web Archiving Workshop (IWAW’02) (2002)

  63. McCown, F., Chan, S., Nelson, M.L., Bollen, J.: The availability and persistence of web references in D-Lib Magazine. In: 5th International Web Archiving Workshop (IWAW’05) (2005). URL: http://www.iwaw.net/05/papers/iwaw05-mccown1.pdf

  64. McCown, F., Nelson, M.L.: Evaluation of crawling policies for a web-repository crawler. In: HYPERTEXT ’06: Proceedings of the Seventeenth ACM Conference on Hypertext and Hypermedia, pp 145–156 (2006). doi:10.1145/1149941.1149972

  65. McCown, F., Smith, J.A., Nelson, M.L., Bollen, J.: Reconstructing websites for the lazy webmaster. Tech. Rep. arXiv cs.IR/0512069 (2005). http://arxiv.org/abs/cs.IR/0512069

  66. McCown, F., Smith, J.A., Nelson, M.L., Bollen, J.: Lazy preservation: Reconstructing websites by crawling the crawlers. In: WIDM ’06: Proceedings of the 8th Annual ACM International Workshop on Web Information and Data Management (2006)

  67. McDonough J.P. (2006). METS: Standardized encoding for digital library objects. Int. J. Digital Libraries 6(2): 148–158. doi:10.1007/s00799-005-0132-1

    Article  Google Scholar 

  68. Menczer, F., Pant, G., Srinivasan, P., Ruiz, M.E.: Evaluating topic-driven web crawlers. In: SIGIR ’01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 241–249 (2001). doi:10.1145/383952.383995

  69. Mitra, N.: SOAP version 1.2 part 0: Primer. Tech. rep., W3C (2003). URL: http://www.w3.org/TR/soap12-part0/

  70. Nelson, M.L., Allen, B.D.: Object persistence and availability in digital libraries. D-Lib Magaz. 8(1) (2002). doi:10.1045/ january2002-nelson

  71. Nelson, M.L., Bollen, J., Manepalli, G., Haq, R.: Archive ingest and handling test: The Old Dominion University approach. D-Lib Magaz. 11(12) (2005). doi:10.1045/december2005-nelson

  72. Nelson, M.L., Smith, J.A., del Campo, I.G., Van de Sompel, H., Liu, X.: Efficient, automatic web resource harvesting. In: WIDM ’06: Proceedings of the 8th Annual ACM International Workshop on Web Information and Data Management (2006)

  73. Nelson, M.L., Van de Sompel, H., Liu, X., Harrison, T.L.: mod_oai: an Apache module for metadata harvesting. Tech. rep., Old Dominion University (2005). ArXiv cs.DL/0503069

  74. Nelson, M.L., Van de Sompel, H., Liu, X., Harrison, T.L., McFarland, N.: mod_oai: an Apache module for metadata harvesting. In: ECDL ’05: Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries, pp. 509–510 (2005)

  75. Ntoulas, A., Cho, J., Olston, C.: What’s new on the Web? The evolution of the Web from a search engine perspective. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 1–12 (2004). doi:10.1145/988672.988674

  76. Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through keyword queries. In: JCDL ’05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 100–109 (2005). doi:10.1145/1065385.1065407

  77. Pandey, S., Roy, S., Olston, C., Cho, J., Chakrabarti, S.: Shuffling a stacked deck: the case for partially randomized ranking of search engine results. In: VLDB ’05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 781–792 (2005)

  78. Park, S.T., Pennock, D.M., Giles, C.L., Krovetz, R.: Analysis of lexical signatures for improving information persistence on the World Wide Web. ACM Trans. Inf. Systems 22(4), 540–572 (2004). doi:10.1145/1028099.1028101

    Article  Google Scholar 

  79. Paskin N. (2002). Digital object identifiers. Inf. Services Use 22(2–3): 97–112

    Article  Google Scholar 

  80. Payette, S., Staples, T.: The Mellon Fedora project. In: ECDL ’02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 406–421 (2002)

  81. Phelps, T.A., Wilensky, R.: Robust hyperlinks cost just five words each. Tech. Rep. UCB/CSD-00-1091, EECS Department, University of California, Berkeley (2000)

  82. Plank J.S. (1997). A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Practice Experience 27(9): 995–1012

    Article  Google Scholar 

  83. Postel, J.B.: Simple mail transfer protocol, Internet RFC-821 (1982)

  84. Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: VLDB ’01: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 129–138 (2001)

  85. Rajasekar, A., Wan, M., Moore, R.: MySRB & SRB: Components of a data grid. In: HPDC ’02: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing HPDC-11 20002 (HPDC’02), pp. 301–310 (2002)

  86. Rao H.C., Chen Y. and Chen M. (2001). A proxy-based personal web archiving service. SIGOPS Oper. Systems Rev. 35(1): 61–72.

    Article  Google Scholar 

  87. Rauber, A., Aschenbrenner, A., Witvoet, O.: Austrian on-line archive processing: Analyzing archives of the World Wide Web. In: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2002), pp. 16–31. Rome, Italy (2002)

  88. Rhea, S., Wells, C., Eaton, P., Geels, D., Zhao, B., Weatherspoon, H., Kubiatowicz, J.: Maintenance-free global data storage. IEEE Internet Comput. 5(5), 40–49 (2001). doi:10.1109/4236.957894

  89. RLG: Preserving Digital Information: Report of the Task Force on Archiving of Digital Information. http://www.rlg.org/ArchTF/ (1996)

  90. Rothenberg, J.: Avoiding technological quicksand: finding a viable technical foundation for digital preservation (1999). http://www.clir.org/PUBS/abstract/pub77.html

  91. Rowstron, A., Druschel, P.: Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In: SOSP ’01: Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, pp. 188–201 (2001). doi:10.1145/502034.502053

  92. Schonfeld, U., Bar-Yossef, Z., Keidar, I.: Do not crawl in the DUST: different URLs with similar text. In: WWW ’06: Proceedings of the 15th International Conference on World Wide Web, pp. 1015–1016 (2006). doi:10.1145/1135777.1135992

  93. Shirky, C.: Aiht: Conceptual issues from practical tests. D-Lib Magaz. 11(12) (2005). doi:10.1045/december2005-shirky

  94. Shivakumar, N., Garcia-Molina, H.: Finding near-replicas of documents and servers on the web. In: WebDB ’98: Selected Papers from the International Workshop on The World Wide Web and Databases, pp. 204–212 (1999)

  95. Smith, J.A., Klein, M., Nelson, M.L.: Repository replication using NNTP and SMTP. In: ECDL ’06: Proceedings of the 10th European Conference on Research and Advanced Technology for Digital Libraries (2006)

  96. Smith, J.A., Klein, M., Nelson, M.L.: Repository replication using NNTP and SMTP. Tech. Rep. arXiv cs.DL/0606008 (2006). http://arxiv.org/abs/cs.DL/0606008

  97. Smith, J.A., McCown, F., Nelson, M.L.: Observed web robot behavior on decaying web subsites. D-Lib Magaz. 12(2) (2006). doi:10.1045/february2006-smith

  98. Spinellis D. (2003). The decay and failures of web references. Commun. ACM 46(1): 71–77. doi:10.1145/602421.602422

    Article  MathSciNet  Google Scholar 

  99. Tansley, R., Bass, M., Stuve, D., Branschofsky, M., Chudnov, D., McClellan, G., Smith, M.: The DSpace institutional digital repository system: current functionality. In: JCDL ’03: Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 87–97 (2003)

  100. Thati, P., Chang, P.H., Agha, G.: Crawlets: Agents for high performance web search engines. In: MA 2001: Proceedings of the 5th International Conference on Mobile Agents, vol. 2240 (2001)

  101. Van de Sompel, H., Lagoze, C.: Notes from the interoperability front: A progress report on the Open Archives Initiative. In: ECDL ’02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pp. 144–157 (2002)

  102. Van de Sompel, H., Nelson, M.L., Lagoze, C., Warner, S.: Resource harvesting within the OAI-PMH framework. D-Lib Magaz. 10(12) (2004). doi:10.1045/december2004-vandesompel

  103. Van de Walle, R., Burnett, I., Dury, G.: ISO/IEC 21000-2 Digital Item Declaration (Output Document of the 70th MPEG Meeting, Palma De Mallorca, Spain, No. ISO/IEC JTC1/SC29/WG11/N6770) (2004)

  104. Young, J.: OAIHarvester2. http://www.oclc.org/research/ software/oai/harvester2.htm (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael L. Nelson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nelson, M.L., McCown, F., Smith, J.A. et al. Using the web infrastructure to preserve web pages. Int J Digit Libr 6, 327–349 (2007). https://doi.org/10.1007/s00799-007-0012-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-007-0012-y

Keywords

Navigation