Abstract
Most of datasets in open government data portals are mainly in tabular format in spreadsheet, e.g. CSV and XLS. To increase the value and reusability of these datasets, the datasets should be made available in RDF format that can support better data querying and data integration. Our previous work proposed a semi-automatic framework for generating RDF datasets from existing datasets in tabular format. In this paper, we extend our framework to support automatic linking of the RDF datasets. One of the important steps is mapping some literal values that appear in a dataset to some standard URIs. Several previous researches use semantic search API such as DBpedia or Sindice for URI mapping. However, this approach is not appropriate for the datasets of Thailand open data portal (Data.go.th) because there is insufficient data for Thai name entities. In addition, a name may match with more than one URI, i.e. word ambiguity. For example, the name “Bangkok” may match with those referenced by URIs of a province, a hospital or a university. To resolve these issues, our framework proposes that finding semantic types is essential to resolve word ambiguity in retrieving a proper URI for a name entity. This paper presents a framework for finding semantic types and mapping name entities to URIs, i.e. URI lookup. A Name Entity Recognition (NER) technique is applied in finding semantic type of a column in a CSV dataset. The results are used for creating ontology and RDF data that include the URI mappings for name entities. We evaluate two approaches by comparing the performance of a semantic search API, i.e. Wikipedia and the NER technique using some datasets from the Data.go.th website.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
https://th.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=กรุงเทพ&rvsection=0.
- 7.
References
Krataithong, P., Buranarach, M., Hongwarittorrn, N.: Semi-automatic framework for generating RDF dataset from open data. In: Proceedings of the 11th International Symposium on Natural Language Processing (SNLP2016), February 2016
Krataithong, P., Buranarach, M., Supnithi, T.: RDF dataset management framework for data.go.th. In: Proceedings of the 10th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2015), November 2015
Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: Proceedings of the 9th International Conference Semantic System - I-SEMANTICS 2013. 105 (2013)
Tirasaroj, N., Aroonmanakun, W.: Thai named entity recognition based on conditional random fields. In: 2009 Eighth International Symposium Natural Language Processing, pp. 216–220 (2009)
Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data (2010)
Maali, F., Cyganiak, R., Peristeras, V.: Re-using Cool URIs: Entity reconciliation against LOD hubs. In: Proceedings of the Linked Data on the Web Workshop 2011 (LDOW 2011), WWW 2011 (2011)
Chanlekha, H., Kawtrakul, A., Varasrai, P., Mulasas, I.: Statistical and heuristic rule based model for thai named entity. In: Proceedings of SNLP 2002 (2002)
Chanlekha, H., Kawtrakul, A.: Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In: Proceedings of the IJCNLP (2004)
Buranarach, M., Thein, Y.M., Supnithi, T.: A community-driven approach to development of an ontology-based application management framework. In: Takeda, H., Qu, Y., Mizoguchi, R., Kitamura, Y. (eds.) JIST 2012. LNCS, vol. 7774, pp. 306–312. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37996-3_21
Knoblock, C.A., et al.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30284-8_32
Sande, M.V., De Vocht, L., Van Deursen, D., Mannens, E., Van De Walle, R.: Lightweight transformation of tabular open data to RDF. In: 8th International Conference on Semantic Systems, pp. 38–42 (2012)
Acknowledgement
This project was funded by the Electronic Government Agency (EGA) and the National Science and Technology Development Agency (NSTDA), Thailand.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Krataithong, P., Buranarach, M., Hongwarittorrn, N., Supnithi, T. (2016). A Framework for Linking RDF Datasets for Thailand Open Government Data Based on Semantic Type Detection. In: Morishima, A., Rauber, A., Liew, C. (eds) Digital Libraries: Knowledge, Information, and Data in an Open Access Society. ICADL 2016. Lecture Notes in Computer Science(), vol 10075. Springer, Cham. https://doi.org/10.1007/978-3-319-49304-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-49304-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49303-9
Online ISBN: 978-3-319-49304-6
eBook Packages: Computer ScienceComputer Science (R0)