Abstract
We improved access to the collection of Dutch historical newspapers of the Koninklijke Bibliotheek by linking named entities in the newspaper articles to corresponding Wikidata descriptions by means of machine learning techniques and crowdsourcing. Indexing the Wikidata identifiers for named entities together with the newspaper articles opens up new possibilities for retrieving articles that mention these resources and searching the newspaper collection using semantic relations from Wikidata. In this paper we describe our steps so far in setting up this combination of entity linking, machine learning and crowdsourcing in our research environment as well as our planned activities aimed at improving the quality of the links and extending the semantic search capabilities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Semantic Web. http://www.w3.org/standards/semanticweb/
DBpedia. http://dbpedia.org/
Wikidata. https://www.wikidata.org/
VIAF, Virtual International Authority File. http://viaf.org/
Van Veen, T., Lonij, J., Koppelaar, H.: Semantic enrichment: a low-barrier infrastructure and proposal for alignment. D-Lib Mag. (2015). doi:10.1045/july2015-vanveen
Odijk, D., Meij, E., de Rijke, M.: Feeding the second screen: semantic linking based on subtitles. In: Open Research Areas in Information Retrieval (OAIR 2013), Lisbon (2013)
Sil, A., Croning, E., et al.: Linking named entities in any database. In: EMNLP-CoNLL 2012 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea (2012)
Stanford Named Entity Recognizer. http://nlp.stanford.edu/software/CRF-NER.shtml
Apache Solr. http://lucene.apache.org/solr/
SURFsara. https://www.surf.nl/en/services-and-products/hpc-cloud/
mySVM. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/index.html
Wikidata statistics. https://www.wikidata.org/wiki/Wikidata:Statistics
SPARQL, query language for RDF. http://www.w3.org/TR/rdf-sparql-query/
SRU, Search and Retrieval via URL’s. http://www.loc.gov/standards/sru/
KB research portal. http://www.kbresearch.nl/xportal/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
van Veen, T., Lonij, J., Faber, W.J. (2016). Linking Named Entities in Dutch Historical Newspapers. In: Garoufallou, E., Subirats Coll, I., Stellato, A., Greenberg, J. (eds) Metadata and Semantics Research. MTSR 2016. Communications in Computer and Information Science, vol 672. Springer, Cham. https://doi.org/10.1007/978-3-319-49157-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-49157-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49156-1
Online ISBN: 978-3-319-49157-8
eBook Packages: Computer ScienceComputer Science (R0)