Abstract
Web archiving is the process of collecting and preserving web content in an archive for current and future generations. One of the key issues in web archiving is that not all websites can be archived correctly due to various issues that arise from the use of different technologies, standards and implementation practices. Nevertheless, one of the common denominators of current websites is that they are implemented using a Web Content Management System (WCMS). We evaluate the Website Archivability (WA) of the most prevalent WCMSs. We investigate the extent to which each WCMS meets the conditions for a safe transfer of their content to a web archive for preservation purposes, and thus identify their strengths and weaknesses. More importantly, we deduce specific recommendations to improve the WA of each WCMS, aiming to advance the general practice of web data extraction and archiving.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
Banos, V., Kim, Y., Ross, S., Manolopoulos, Y.: CLEAR: a credible method to evaluate website archivability. In: Proceedings 10th International Conference on Preservation of Digital Objects (iPRES) (2013)
Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the clear+ method. Int. J. Digital Libr. (2015)
Blanvillain, O., Kasioumis, N., Banos, V.: Blogforever crawler: techniques and algorithms to harvest modern weblogs. In: Proceedings 4th International Conference on Web Intelligence, Mining & Semantics (WIMS) (2014)
Boiko, B.: Understanding content management. Bull. Am. Soc. Inf. Sci. Technol. 28(1), 8–13 (2001)
Coalition, D.P.: Institutional strategies - standards and best practice guidelines (2012). http://www.dpconline.org/advice/preservationhandbook/institutional-strategies/standards-and-best-practice-guidelines. Accessed 10 November 2014
Day, M.: Metadata, curation reference manual (2005). http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/metadata. Accessed 10 November 2014
Donnelly, M.: JSTOR/Harvard Object Validation Environment (JHOVE). Digital Curation Centre Case Studies and Interviews (2006)
Faheem, M., Senellart, P.: Intelligent and adaptive crawling of web applications for web archiving. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 306–322. Springer, Heidelberg (2013)
Fernández-Garcia, N., Sánchez-Fernandez, L., Villamor-Lugo, J.: Next generation web technologies in content management. In: Proceedings (companion) 13th International Conference on World Wide Web (WWW), pp. 260–261 (2004)
Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext transfer protocol-http/1.1 (1999). http://tools.ietf.org/html/rfc2616. Accessed 10 November 2014
Gomes, D., Costa, M., Cruz, D., Miranda, J., Fontes, S.: Creating a billion-scale searchable web archive. In: Proceedings (companion) 22nd International Conference on World Wide Web (WWW), pp. 1059–1066 (2013)
Kasioumis, N., Banos, V., Kalb, H.: Towards building a blog preservation platform. World Wide Web 17(4), 799–825 (2014)
Kelly, B., Guy, M.: Approaches to archiving professional blogs hosted in the cloud. In: Proceedings 7th International Conference on Preservation of Digital Objects (iPRES) (2010)
Lawrence, S., Pennock, D.M., Flake, G.W., Krovetz, R., Coetzee, F.M., Glover, E., Nielsen, F.Å., Kruger, A., Giles, C.L.: Persistence of web references in scientific research. IEEE Comput. 34(2), 26–31 (2001)
McKeever, S.: Understanding web content management systems: evolution, lifecycle and market. Ind. Manage. Data Syst. 103(9), 686–692 (2003)
Niu, J.: An overview of web archiving. D-Lib Magazine, 18(3/4) (2012)
Pennock, M., Davis, R.: Archivepress: a really simple solution to archiving blog content. In: Proceedings 6th International Conference on Preservation of Digital Objects (iPRES) (2009)
Pinsent, E., Davis, R., Ashley, K., Kelly, B., Guy, M., Hatcher, J.: PoWR: the preservation of web resources handbook (2010)
Rumianek, M.: Archiving and recovering database-driven websites. D-Lib Magazine 19(1/2) (2013)
W3Techs. Usage of content management systems for websites (2014). http://w3techs.com/technologies/overview/content_management/all. Accessed 10 November 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Banos, V., Manolopoulos, Y. (2015). Web Content Management Systems Archivability. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-23135-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23134-1
Online ISBN: 978-3-319-23135-8
eBook Packages: Computer ScienceComputer Science (R0)