Crowd Mining Applied to Preservation of Digital Cultural Heritage

  • Emma L. Tonkin
  • Gregory J. L. Tourte
  • Alastair Gill
Part of the Springer Series on Cultural Computing book series (SSCC)


Accessible systems, in digital heritage as elsewhere, should ‘speak the user’s language’. However, over long time periods, this may change significantly, and the system must still keep track of it. Conceptualising and tracking change in a population may be achieved using a functional and computable model based on representative datasets. Such a model must encompass relevant characteristics in that population and support predefined functionality, such as the ability to track current trends in language use. Individual published viewpoints on any given platform may be observed in aggregate by means of a large-scale text mining approach. We have made use of social media platforms such as Twitter and Tumblr to collect statistical information about anonymous users’ perspectives on cultural heritage items and institutions. Through longitudinal studies, it is possible to identify indicators pointing to an evolution of discourse surrounding cultural heritage items, and provide an estimate of trends relating to represented items and creators. We describe a functional approach to building useful models of shift in contemporary language use, using data collection across social networks. This approach is informed by existing theoretical approaches to modelling of semantic change. As a case study, we present a means by which such ongoing user modelling processes drawing on contemporary resources can support ‘just-in-time’ pre-emptive review of material to be presented to the public. We also show that this approach can feed into enhancement of the data retrieval processes.


  1. Abu-Shumays M, Leinhardt G (2002) Two docents in three museums: central and peripheral participation. Learning conversations in museums, pp 45–80Google Scholar
  2. Aoki PM, Grinter RE, Hurst A, Szymanski MH, Thornton JD, Woodruff A (2002) Sotto voce: exploring the interplay of conversation and mobile audio spaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 431–438Google Scholar
  3. Ardissono L, Kuflik T, Petrelli D (2012) Personalization in cultural heritage: the road travelled and the one ahead. User Model User-Adapt Interact 22(1):73–99. CrossRefGoogle Scholar
  4. Baltimore City Archives (2014) Transcribing and inventorying the records of Baltimore City, 1905–1940.
  5. Bell S, McDiarmid A, Irvine J (2011) Nodobo: mobile phone as a software sensor for social network research. In: 2011 IEEE 73rd vehicular technology conference (VTC Spring), pp 1–5.
  6. Borgman CL (1986) Why are online catalogs hard to use? Lessons learned from information-retrieval studies. J Am Soc Inf Sci 37(6):387–400CrossRefGoogle Scholar
  7. Borgman CL (1996) Why are online catalogs still hard to use? JASIS 47(7):493–503CrossRefGoogle Scholar
  8. Borgman CL (2003) Personal digital libraries: creating individual spaces for innovation. In: Nsf post-digital library futures workshopGoogle Scholar
  9. Breeding M (2007) Next-gen library catalogs. Library technology reports, pp 10–13Google Scholar
  10. Brunsmann J (2011) Product lifecycle metadata harmonization with the future in OAIS archives. In: International conference on Dublin core and metadata applications, pp 126–136Google Scholar
  11. Burrows A, Gooberman-Hill R, Coyle D (2015, 12) Shared language and the design of home healthcare technology. In: Proceedings of the ACM conference on human factors in computing systemsGoogle Scholar
  12. Canter D, Rivers R, Storrs G (1985) Characterizing user navigation through complex data structures. Behav Inf Technol 4(2):93–102CrossRefGoogle Scholar
  13. Carlo Bertot J, Snead JT, Jaeger PT, McClure CR (2006) Functionality, usability, and accessibility: Iterative user-centered evaluation strategies for digital libraries. Perform Meas Metr 7(1):17–28CrossRefGoogle Scholar
  14. Chan S (2007) Tagging and searching: serendipity and museum collection databasesGoogle Scholar
  15. Cunliffe D, Kritou E, Tudhope D (2001) Usability evaluation for museum web sites. Mus Manag Curatorship 19(3):229–252CrossRefGoogle Scholar
  16. Dini R, Paternò F, Santoro C (2007) An environment to support multi-user interaction and cooperation for improving museum visits through games. In: Proceedings of the 9th international conference on human computer interaction with mobile devices and services, pp 515–521Google Scholar
  17. Dokoohaki N, Matskin M (2008) Personalizing human interaction through hybrid ontological profiling: cultural heritage case study. In: Ronchetti M (ed) 1st workshop on semantic web applications and human aspects, (SWAHA08), pp 133–140 (In conjunction with Asian Semantic Web Conference)Google Scholar
  18. Domingo A, Bellalta B, Palacin M, Oliver M, Almirall E (2013) winter) Public open sensor data: revolutionizing smart cities. IEEE Technol Soc Mag 32(4):50–56. CrossRefGoogle Scholar
  19. Dorow B, Widdows D (2003) Discovering corpus-specific word senses. In: Proceedings of the tenth conference on European chapter of the association for computational linguistics, vol 2, pp 79–82Google Scholar
  20. Etzioni O, Banko M, Cafarella MJ (2006) Machine reading. In: Aaai, vol 6, pp 1517–1519Google Scholar
  21. Factor M, Henis E, Naor D, Rabinovici-Cohen S, Reshef P, Ronen S, Guercio M (2009) Authenticity and provenance in long term digital preservation: modeling and implementation in preservation aware storage. In: First workshop on theory and practice of provenance, pp 6:1–6:10. Berkeley, CA, USA: USENIX AssociationGoogle Scholar
  22. Falk JH, Dierking LD (2000) Learning from museums: visitor experiences and the making of meaning. Altamira Press, LanhamGoogle Scholar
  23. Fantoni SF (2006) Web-based solutions: save it for later.
  24. Fantoni SF, Bowen JP (2007) Bookmarking in museums: extending the museum experience beyond the visit. In: Trant J, Bearman D (eds) Museums and the web 2007Google Scholar
  25. Furnas GW (1985) Experience with an adaptive indexing scheme, vol 16(4). ACM, New YorkGoogle Scholar
  26. Ghani JA, Deshpande SP (1994) Task characteristics and the experience of optimal flow in human-computer interaction. J Psychol 128(4):381–391CrossRefGoogle Scholar
  27. Gloor PA, Oster D, Raz O, Pentland A, Schoder D (2010) The virtual mirror: reflecting on the social and psychological self to increase organizational creativity. Int Stud Manag Organ 40(2):74–94CrossRefGoogle Scholar
  28. Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora arXiv:1606.02820
  29. Hedstrom M (1997) Digital preservation: a time bomb for digital libraries. Comput Humanit 31(3):189–202CrossRefGoogle Scholar
  30. Hildreth C (1987, Spring) Beyond Boolean; designing the next generation of online catalogues. Libr Trends 647–667Google Scholar
  31. Islam AC, Bryson JJ, Narayanan A (2016) Semantics derived automatically from language corpora necessarily contain human biases. CoRR. arXiv:1608.07187
  32. Ito M, Gutierrez K, Livingstone S, Penuel B, Rhodes J, Salen K, Watkins SC (2013) Connected learning: an agenda for research and design. Digital Media and Learning Research HubGoogle Scholar
  33. Jeffrey S (2012) A new digital dark age? Collaborative web tools, social media and long-term preservation. World Archaeol 44(4):553–570CrossRefGoogle Scholar
  34. Kelly M, Brunelle JF, Weigle MC, Nelson ML (2013) On the change in archivability of websites over time. In: Aalberg T, Papatheodorou C, Dobreva M, Tsakonas G, Farrugia CJ (eds) Research and advanced technology for digital libraries: international conference on theory and practice of digital libraries, TPDL 2013, Valletta, Malta, 22–26 September 2013. Proceedings, pp 35–47. Springer, Berlin.
  35. Kobsa A, Schreck J (2003) Privacy through pseudonymity in user-adaptive systems. ACM Trans Internet Technol (TOIT) 3(2):149–183CrossRefGoogle Scholar
  36. Kontopoulos E, Riga M, Mitzias P, Andreadis S, Stavropoulos T, Konstantinidis K, Tonkin EL (2016) Pericles deliverable 4.4: modelling contextualised semantics. PERICLES projectGoogle Scholar
  37. Kuflik T, Kay J, Kummerfeld B (2012) Challenges and solutions of ubiquitous user modeling. In: Krüger A, Kuflik T (eds) Ubiquitous display environments, pp 7–30. Springer, Berlin.
  38. Kuny T (1998) The digital dark ages? Challenges in the preservation of electronic information. Int Preserv News 17:8–13Google Scholar
  39. Lafrance A (2016) Archaeology’s information revolution - the Atlantic. Accessed 02 Mar 2017
  40. Lavoie B, Alexander M, Rieger O, Bradley K, Sergeant D, Day M, Woodyard D (2002) Preservation metadata and the OAIS information model. A metadata framework to support the preservation of digital objects. Technical report. OCLC Online Computer Library Center, Inc., Dublin, OH.
  41. Lin ACH, Gregor SD (2006) Designing websites for learning and enjoyment: a study of museum experiences. Int Rev Res Open Distrib Learn 7(3)Google Scholar
  42. Lin ACH, Gregor SD, Ewing M (2008) Developing a scale to measure the enjoyment of web experiences. J Interact Mark 22(4):40–57CrossRefGoogle Scholar
  43. Lohnas LJ, Kahana MJ (2013) Parametric effects of word frequency in memory for mixed frequency lists. J Exp Psychol Learn Mem Cogn 39(6):1943–1946CrossRefGoogle Scholar
  44. Manovich L (2011) Trending: the promises and the challenges of big social data. Debates Digit Humanit 2:460–475Google Scholar
  45. Maronidis A, Chatzilari E, Kontopoulos E, Nikopoulos S, Riga M, Mitzias P, other (2016) Pericles deliverable 4.3: content semantics and use context analysis techniques. Technical report.
  46. Marty PF (2011) My lost museum: user expectations and motivations for creating personal digital collections on museum websites. Libr Inf Sci Res 33(3):211–219CrossRefGoogle Scholar
  47. Mull IR, Lee S-E (2014) “Pin” pointing the motivational dimensions behind pinterest. Comput Hum Behav 33:192–200. CrossRefGoogle Scholar
  48. Ohm P (2009) Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev 57:1701–1777.
  49. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. CrossRefGoogle Scholar
  50. Resch B, Summa A, Sagl G, Zeile P, Exner J-P (2015) Urban emotions — geo-semantic emotion extraction from technical sensors, human sensors and crowdsourced data. Progress in location-based services 2014. Springer, Berlin, pp 199–212Google Scholar
  51. Rosi A, Mamei M, Zambonelli F, Dobson S, Stevenson G, Ye J (2011) Social sensors and pervasive services: approaches and perspectives. In: 2011 IEEE international conference on pervasive computing and communications workshops (percom workshops), pp 525–530.
  52. Ruotsalo T, Mäkelä E, Kauppinen T, Hyvönen E, Haav K, Rantala V, Matskin M (2009) Smartmuseum – personalized context-aware access to digital cultural heritageGoogle Scholar
  53. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web. ACM, New York, pp 851–860.
  54. Siegal N (2015) Rijksmuseum removing racially charged terms from artworks, titles and descriptions. New York Times.
  55. Stevens ME (1970) Automatic indexing: a state-of-the-art reportGoogle Scholar
  56. Tonkin EL (2015) Supporting unsupervised context identification using social and physical sensors (Unpublished doctoral dissertation). The University of BristolGoogle Scholar
  57. Trant J (2009) Studying social tagging and folksonomy: a review and framework. J Digit Inf 10(1)Google Scholar
  58. Van Laere O, Bordino I, Mejova Y, Lalmas M (2014) Deesse: entity-driven exploratory and serendipitous search system. In: Proceedings of the 23rd ACM international conference on information and knowledge management. ACM, New York, pp 2072–2074.
  59. Van Loon H, Gabriëls K, Teunkens D, Robert K, Luyten K, Coninx K, Manshoven E (2006) Designing for interaction: socially-aware museum handheld guides. NODEM 06-Digital Interpretation in Cultural Heritage, Art and ScienceGoogle Scholar
  60. Van Velsen L, Van Der Geest T, Klaassen R, Steehouder M (2008) User-centered evaluation of adaptive and adaptable systems: a literature review. Knowl Eng Rev 23(3):261–281. Google Scholar
  61. Wang Y, Aroyo LM, Stash N, Rutledge L (2007) Interactive user modeling for personalized access to museum collections: the Rijksmuseum case study. User modeling 2007. Springer, Berlin, pp 385–389Google Scholar
  62. Waterfield G (2000) The origins of the early picture gallery catalogue in Europe, and its manifestation in victorian Britain. Art in museums, pp 42–73Google Scholar
  63. Weller K (2007) Folksonomies and ontologies: two new players in indexing and knowledge representation. Appl Web 2:108–115Google Scholar
  64. Wilson K (2007) Opac 2.0: next generation online library catalogues ride the web 2.0 wave! Online Curr 21(10):406Google Scholar
  65. Wojciechowski R, Walczak K, White M, Cellary W (2004) Building virtual and augmented reality museum exhibitions. In: Proceedings of the ninth international conference on 3d web technology. ACM, New York, pp 135–144.
  66. Zavalina OL, Shakeri S, Kizhakkethil P (2015) Metadata change in traditional library collections and digital repositories: exploratory comparative analysis. In: Proceedings of the 78th ASIS&T annual meeting: information science with impact: research in and for the community. American Society for Information Science, Silver Springs, pp 146:1–146:5Google Scholar
  67. Zeng R, Greenfield PM (2015) Cultural evolution over the last 40 years in China: using the Google Ngram viewer to study implications of social and political change for cultural values. Int J Psychol 50(1):47–55CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Emma L. Tonkin
    • 1
  • Gregory J. L. Tourte
    • 2
  • Alastair Gill
    • 3
  1. 1.School of Electrical EngineeringUniversity of BristolBristolUK
  2. 2.School of Geographical SciencesUniversity of BristolBristolUK
  3. 3.King’s College LondonLondonUK

Personalised recommendations