Skip to main content

A Framework for Linking RDF Datasets for Thailand Open Government Data Based on Semantic Type Detection

  • Conference paper
  • First Online:
Digital Libraries: Knowledge, Information, and Data in an Open Access Society (ICADL 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10075))

Included in the following conference series:

Abstract

Most of datasets in open government data portals are mainly in tabular format in spreadsheet, e.g. CSV and XLS. To increase the value and reusability of these datasets, the datasets should be made available in RDF format that can support better data querying and data integration. Our previous work proposed a semi-automatic framework for generating RDF datasets from existing datasets in tabular format. In this paper, we extend our framework to support automatic linking of the RDF datasets. One of the important steps is mapping some literal values that appear in a dataset to some standard URIs. Several previous researches use semantic search API such as DBpedia or Sindice for URI mapping. However, this approach is not appropriate for the datasets of Thailand open data portal (Data.go.th) because there is insufficient data for Thai name entities. In addition, a name may match with more than one URI, i.e. word ambiguity. For example, the name “Bangkok” may match with those referenced by URIs of a province, a hospital or a university. To resolve these issues, our framework proposes that finding semantic types is essential to resolve word ambiguity in retrieving a proper URI for a name entity. This paper presents a framework for finding semantic types and mapping name entities to URIs, i.e. URI lookup. A Name Entity Recognition (NER) technique is applied in finding semantic type of a column in a CSV dataset. The results are used for creating ontology and RDF data that include the URI mappings for name entities. We evaluate two approaches by comparing the performance of a semantic search API, i.e. Wikipedia and the NER technique using some datasets from the Data.go.th website.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://govspending.data.go.th/.

  2. 2.

    http://5stardata.info/en/.

  3. 3.

    http://demo-api.data.go.th/.

  4. 4.

    http://wiki.dbpedia.org/projects/dbpedia-lookup.

  5. 5.

    http://www.sindice.com/.

  6. 6.

    https://th.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=กรุงเทพ&rvsection=0.

  7. 7.

    https://th.wikipedia.org/w/api.php.

References

  1. Krataithong, P., Buranarach, M., Hongwarittorrn, N.: Semi-automatic framework for generating RDF dataset from open data. In: Proceedings of the 11th International Symposium on Natural Language Processing (SNLP2016), February 2016

    Google Scholar 

  2. Krataithong, P., Buranarach, M., Supnithi, T.: RDF dataset management framework for data.go.th. In: Proceedings of the 10th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2015), November 2015

    Google Scholar 

  3. Ermilov, I., Auer, S., Stadler, C.: User-driven semantic mapping of tabular data. In: Proceedings of the 9th International Conference Semantic System - I-SEMANTICS 2013. 105 (2013)

    Google Scholar 

  4. Tirasaroj, N., Aroonmanakun, W.: Thai named entity recognition based on conditional random fields. In: 2009 Eighth International Symposium Natural Language Processing, pp. 216–220 (2009)

    Google Scholar 

  5. Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data (2010)

    Google Scholar 

  6. Maali, F., Cyganiak, R., Peristeras, V.: Re-using Cool URIs: Entity reconciliation against LOD hubs. In: Proceedings of the Linked Data on the Web Workshop 2011 (LDOW 2011), WWW 2011 (2011)

    Google Scholar 

  7. Chanlekha, H., Kawtrakul, A., Varasrai, P., Mulasas, I.: Statistical and heuristic rule based model for thai named entity. In: Proceedings of SNLP 2002 (2002)

    Google Scholar 

  8. Chanlekha, H., Kawtrakul, A.: Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In: Proceedings of the IJCNLP (2004)

    Google Scholar 

  9. Buranarach, M., Thein, Y.M., Supnithi, T.: A community-driven approach to development of an ontology-based application management framework. In: Takeda, H., Qu, Y., Mizoguchi, R., Kitamura, Y. (eds.) JIST 2012. LNCS, vol. 7774, pp. 306–312. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37996-3_21

    Chapter  Google Scholar 

  10. Knoblock, C.A., et al.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30284-8_32

    Chapter  Google Scholar 

  11. Sande, M.V., De Vocht, L., Van Deursen, D., Mannens, E., Van De Walle, R.: Lightweight transformation of tabular open data to RDF. In: 8th International Conference on Semantic Systems, pp. 38–42 (2012)

    Google Scholar 

Download references

Acknowledgement

This project was funded by the Electronic Government Agency (EGA) and the National Science and Technology Development Agency (NSTDA), Thailand.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pattama Krataithong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Krataithong, P., Buranarach, M., Hongwarittorrn, N., Supnithi, T. (2016). A Framework for Linking RDF Datasets for Thailand Open Government Data Based on Semantic Type Detection. In: Morishima, A., Rauber, A., Liew, C. (eds) Digital Libraries: Knowledge, Information, and Data in an Open Access Society. ICADL 2016. Lecture Notes in Computer Science(), vol 10075. Springer, Cham. https://doi.org/10.1007/978-3-319-49304-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49304-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49303-9

  • Online ISBN: 978-3-319-49304-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics