Abstract
This paper presents results of a work on crawling CEUR Workshop proceedings(CEUR Workshop proceedings web site, URL: http://ceur-ws.org) web site to a Linked Open Data (LOD) dataset in the framework of ESWC 2014 Semantic Publishing Challenge 2014(ESWC 2014 Semantic Publishing Challenge, URL: http://2014.eswc-conferences.org/semantic-publishing-challenge). Our approach is based on using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as the names of universities and countries.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
ESWC 2014 Semantic Publishing Challenge, URL: http://2014.eswc-conferences.org/semantic-publishing-challenge.
- 2.
CEUR Workshop proceedings web site, URL: http://ceur-ws.org
- 3.
The source code and instructions, URL: https://github.com/ailabitmo/sempub challenge2014-task1.
- 4.
Grab framework, URL: http://grablib.org/.
- 5.
Semantic Web Conference Ontology, URL: http://data.semanticweb.org/ns/swc/ontology.
- 6.
Semantic Web for Research Communities, URL: http://ontoware.org/swrc/.
- 7.
The Bibliographic Ontology, URL: http://purl.org/ontology/bibo/.
- 8.
The Timeline Ontology, URL: http://purl.org/NET/c4dm/timeline.owl#.
- 9.
The Friend of a Friend (FOAF), URL: http://www.foaf-project.org/.
- 10.
Dublin Core, URL: http://purl.org/dc/elements/1.1/.
- 11.
DBpedia Ontology, URL: http://dbpedia.org/ontology/.
- 12.
RDF Schema, URL: http://www.w3.org/2000/01/rdf-schema#.
- 13.
PDFMiiner, URL: http://www.unixuser.org/~euske/python/pdfminer/.
- 14.
DBLP, URL: http://www.informatik.uni-trier.de/~ley/db/.
- 15.
Semantic Web Dog Food, URL: http://data.semanticweb.org/.
References
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. (2014). http://www.semantic-web-journal.net/content/dbpedia-large-scale-multilingual-knowledge-base-extracted-wikipedia-0
Ratcliff, J.W., Metzener, D.E.: Pattern-matching-the gestalt approach. Dr DOBBS J. (DDJ) 13(7), 1–46 (1988)
Acknowledgments
This work has been partially financially supported by the Government of Russian Federation, Grant #074-U01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kolchin, M., Kozlov, F. (2014). A Template-Based Information Extraction from Web Sites with Unstable Markup. In: Presutti, V., et al. Semantic Web Evaluation Challenge. SemWebEval 2014. Communications in Computer and Information Science, vol 475. Springer, Cham. https://doi.org/10.1007/978-3-319-12024-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-12024-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12023-2
Online ISBN: 978-3-319-12024-9
eBook Packages: Computer ScienceComputer Science (R0)