Abstract
This paper argues that representing texts as semantic Linked Data provides a useful basis for analyzing their contents in Digital Humanities research and for Cultural Heritage application development. The idea is to transform Cultural Heritage texts into a knowledge graph and a Linked Data service that can be used flexibly in different applications via a SPARQL endpoint. The argument is discussed and evaluated in the context of biographical and prosopographical research and a case study where over 13 000 life stories form biographical collections of Biographical Centre of the Finnish Literature Society were transformed into RDF, enriched by data linking, and published in a SPARQL endpoint. Tools for biography and prosopography, data clustering, network analysis, and linguistic analysis were created with promising first results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://www.w3.org/standards/semanticweb/ accessed: 13 August 2018.
- 2.
In some use cases, e.g., in person registries [12], the whole registry entry may be written using this kind of semi-formal language.
- 3.
http://cidoc-crm.org accessed: 13 August 2018.
- 4.
http://persistence.uni-leipzig.org/nlp2rdf/specification/core.html accessed: 13 August 2018.
- 5.
http://dublincore.org/documents/dcmi-terms/ accessed: 13 August 2018.
- 6.
Denoted with prefix nbf in the RDF examples.
- 7.
http://turkunlp.github.io/Finnish-dep-parser/ accessed: 13 August 2018.
- 8.
http://universaldependencies.org/format.html accessed: 13 August 2018.
- 9.
For example, the SeCo LAS [19] is a combination of several Finnish NLP tools.
- 10.
The running time complexity O(mn), where n is the amount of files and m their size in bytes.
- 11.
https://kansallisbiografia.fi/ accessed: 13 August 2018.
- 12.
http://www.ldf.fi accessed: 13 August 2018.
- 13.
http://seco.cs.aalto.fi/projects/dcert/ accessed: 13 August 2018.
- 14.
http://developers.google.com/maps/ accessed: 13 August 2018.
- 15.
http://www.tfidf.com/ accessed: 13 August 2018.
- 16.
https://radimrehurek.com/gensim/ accessed: 13 August 2018.
- 17.
https://gephi.org/ accessed: 13 August 2018.
- 18.
http://www.oxforddnb.com/ accessed: 13 August 2018.
- 19.
http://www.anb.org/ accessed: 13 August 2018.
- 20.
http://www.ndb.badw-muenchen.de accessed: 13 August 2018.
- 21.
https://sok.riksarkivet.se/Sbl/Start.aspx accessed: 13 August 2018.
- 22.
http://www.biografischportaal.nl/en accessed: 13 August 2018.
- 23.
http://www.biographynet.nl/ accessed: 13 August 2018.
References
Chiarcos, C., Fäth, C.: CoNLL-RDF: linked corpora done in an NLP-friendly way. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 74–88. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_6
Fokkens, A., et al.: Biographynet: extracting relations between people and events. In: Europa baut auf Biographien, pp. 193–224. New Academic Press, Wien (2017)
Gardiner, E., Musto, R.G.: The Digital Humanities: A Primer for Students and Scholars. Cambridge University Press, Cambridge (2015)
Hakosalo, H., Jalagin, S., Junila, M., Kurvinen, H.: Historiallinen elämä - Biografia ja historiantutkimus. Suomalaisen Kirjallisuuden Seura (SKS) (2014)
Haverinen, K., et al.: Building the essential resources for Finnish: the Turku Dependency Treebank. Lang. Resour. Eval. 48, 493–531 (2014). https://doi.org/10.1007/s10579-013-9244-1. Open access
Heath, T., Bizer, C.: Linked data: evolving the web into a global data space. Synthesis Lectures on the Semantic Web: Theory and Technology, 1 edn. Morgan & Claypool, Palo Alto (2011). http://linkeddatabook.com/editions/1.0/. Accessed 13 Aug 2018
Hellmann, S., Lehmann, J., Auer, S., Brümmer, M.: Integrating NLP using linked data. In: Alani, H. (ed.) ISWC 2013. LNCS, vol. 8219, pp. 98–113. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_7
Hitzler, P., Krötzsch, M., Rudolph, S.: Foundations of Semantic Web Technologies. Springer, Heidelberg (2010)
Hyvönen, E.: Publishing and Using Cultural Heritage Linked Data on the Semantic Web. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan & Claypool, Palo Alto (2012)
Hyvönen, E., Alonen, M., Ikkala, E., Mäkelä, E.: Life stories as event-based linked data: case semantic national biography. In: Proceedings of ISWC 2014 Posters & Demonstrations Track. CEUR Workshop Proceedings, October 2014. http://ceur-ws.org/Vol-1272/. Accessed 13 Aug 2018
Hyvönen, E., Ikkala, E., Tuominen, J.: Linked data brokering service for historical places and maps. In: Proceedings of the 1st Workshop on Humanities in the Semantic Web (WHiSe), vol. 1608, pp. 39–52. CEUR Workshop Proceedings (2016). http://ceur-ws.org/Vol-1608/#paper-06. Accessed 13 Aug 2018
Hyvönen, E., Leskinen, P., Heino, E., Tuominen, J., Sirola, L.: Reassembling and enriching the life stories in printed biographical registers: norssi high school alumni on the semantic web. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 113–119. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_9
Hyvönen, E., Leskinen, P., Tamper, M., Tuominen, J., Keravuori, K.: Semantic national biography of Finland. In: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018), vol. 2084, pp. 372–385. CEUR Workshop Proceedings, March 2018. http://www.ceur-ws.org/Vol-2084/short12.pdf. Accessed 13 Aug 2018
Hyvönen, E., Tuominen, J., Alonen, M., Mäkelä, E.: Linked data Finland: a 7-star model and platform for publishing and re-using linked datasets. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 226–230. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11955-7_24
Ikkala, E., Tuominen, J., Hyvönen, E.: Contextualizing historical places in a gazetteer by using historical maps and linked data. In: Proceedings of Digital Humanities 2016, Short Papers, pp. 573–577 (2016)
Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: Recommender Systems: An Introduction. Cambridge University Press, Cambridge (2011)
Leskinen, P., Hyvönen, E., Tuominen, J.: Analyzing and visualizing prosopographical linked data based on short biographies. In: Biographical Data in a Digital World 2017 (BD 2017), Linz, Austria, November 2017. http://ceur-ws.org/Vol-2119/paper7.pdf. Accessed 13 Aug 2018
McSweeney, P.J.: Gephi network statistics. Google Summer Code, pp. 1–8 (2009)
Mäkelä, E.: LAS: an integrated language analysis tool for multiple languages. J. Open Source Softw. 1(6) (2016). https://doi.org/10.21105/joss.00035. Accessed 13 Aug 2018
Otte, E., Rousseau, R.: Social network analysis: a powerful strategy, also for the information sciences. J. Inf. Sci. 28(6), 441–453 (2002)
Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: ten Teije, A. (ed.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 114–129. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33876-2_12
Pyysalo, S., Ginter, F.: Collaborative development of annotation guidelines with application to universal dependencies. In: The Fifth Swedish Language Technology Conference (2014)
Roberts, B.: Biographical Research. Understanding Social Research. Open University Press (2002)
Rospocher, M., et al.: Building event-centric knowledge graphs from news. Web Semant. Sci., Serv. Agents World Wide Web 37, 132–151 (2016)
Shultz, K.: What is distant reading? New York Times, 24 June 2011. https://www.nytimes.com/2011/06/26/books/review/the-mechanic-muse-what-is-distant-reading.html. Accessed 13 Aug 2018
Tuominen, J., Hyvönen, E., Leskinen, P.: Bio CRM: a data model for representing biographical data for prosopographical research. In: Proceedings of the Biographical Data in a Digital World 2017 (BD2017). CEUR Workshop Proceedings (2018). http://ceur-ws.org/Vol-2119/paper10.pdf. Accessed 13 Aug 2018
Verboven, K., Carlier, M., Dumolyn, J.: A short manual to the art of prosopography. In: Prosopography Approaches and Applications. A Handbook, pp. 35–70. Unit for Prosopographical Research (Linacre College) (2007)
Wu, Y., Sun, H., Yan, C.: An event timeline extraction method based on news corpus. In: 2017 IEEE 2nd International Conference on Big Data Analysis, pp. 697–702. IEEE (2017)
Acknowledgements
Our research is part of the Severi project (http://seco.cs.aalto.fi/projects/severi accessed: 13 August 2018), funded mainly by Business Finland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Tamper, M., Leskinen, P., Apajalahti, K., Hyvönen, E. (2018). Using Biographical Texts as Linked Data for Prosopographical Research and Applications. In: Ioannides, M., et al. Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection. EuroMed 2018. Lecture Notes in Computer Science(), vol 11196. Springer, Cham. https://doi.org/10.1007/978-3-030-01762-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-01762-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01761-3
Online ISBN: 978-3-030-01762-0
eBook Packages: Computer ScienceComputer Science (R0)