Skip to main content

Syntactical Heuristics for the Open Data Quality Assessment and Their Applications

  • Conference paper
  • First Online:
Business Information Systems Workshops (BIS 2018)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 339))

Included in the following conference series:

Abstract

Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens.

This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Messytables documentation https://messytables.readthedocs.io/en/latest.

  2. 2.

    DataChecker open source library available on GitHub at https://github.com/donpir/JSDataChecker.

References

  1. Ambrosino, M.A., et al.: Protection and preservation of campania cultural heritage engaging local communities via the use of open data. In: Proceedings of the 19th International Conference on Digital Government Research. ACM (2018). https://doi.org/10.1145/3209281.3209347

  2. Andriessen, J., et al.: Increasing public value through co-creation of open knowledge. In: 2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), pp. 47–54. IEEE (2017)

    Google Scholar 

  3. Beno, M., Figl, K., Umbrich, J., Polleres, A.: Open data hopes and fears: determining the barriers of open data. In: 2017 Conference for E-Democracy and Open Government (CeDEM), pp. 69–81. IEEE (2017)

    Google Scholar 

  4. Berners-Lee, T.: Linked data - design issues. http://www.w3.org/Designlssues/LinkedData.html. Accessed 03 May 2018

  5. Castro, D., Korte, T.: Open data in the G8: a review of progress on the open data charter (2015). Accessed 23 May 2018

    Google Scholar 

  6. Commission, E.: Open data maturity in Europe 2017 (2017). https://www.europeandataportal.eu/sites/default/files/edp_landscaping_insight_report_n3_2017.pdf

  7. Commission, E.: Open data portal (2017). https://www.europeandataportal.eu/data/it/dataset

  8. Commission, E.: Re-using open data (2017). https://www.europeandataportal.eu/sites/default/files/re-using_open_data.pdf

  9. Cordasco, G., et al.: Engaging citizens with a social platform for open data. In: Proceedings of the 18th Annual International Conference on Digital Government Research, pp. 242–249. ACM (2017)

    Google Scholar 

  10. Dawes, S.S., Helbig, N.: Information strategies for open government: challenges and prospects for deriving public value from government transparency. In: Wimmer, M.A., Chappelet, J.-L., Janssen, M., Scholl, H.J. (eds.) EGOV 2010. LNCS, vol. 6228, pp. 50–60. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14799-9_5

    Chapter  Google Scholar 

  11. De Donato, R., et al.: Agile production of high quality open data. In: Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, p. 84. ACM (2018)

    Google Scholar 

  12. De Donato, R., et al.: Datalet-ecosystem provider (deep): scalable architecture for reusable, portable and user-friendly visualizations of open data. In: 2017 Conference for E-Democracy and Open Government (CeDEM), pp. 92–101. IEEE (2017)

    Google Scholar 

  13. Döhmen, T., Mühleisen, H., Boncz, P.: Multi-hypothesis CSV parsing. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 16. ACM (2017)

    Google Scholar 

  14. European Data Portal: Open data goldbook for data manager and data holders. https://www.europeandataportal.eu/sites/default/files/goldbook.pdf. Accessed 23 May 2018

  15. Fish, A., Gargiulo, C., Malandrino, D., Pirozzi, D., Scarano, V.: Visual exploration system in an industrial context. IEEE Trans. Industr. Inf. 12(2), 567–575 (2016)

    Article  Google Scholar 

  16. Foundation TWWW: Open data barometer 4th (edn.) Global Report, May 2017. http://opendatabarometer.org/doc/4thEdition/ODB-4thEdition-GlobalReport.pdf

  17. Helbig, N., Cresswell, A.M., Burke, G.B., Luna-Reyes, L.: The dynamics of opening government data. Center for Technology in Government (2012). http://www.ctg.albany.edu/publications/reports/opendata. Accessed 23 May 2018

  18. International OK: Open data handbook. http://opendatahandbook.org/glossary/. Accessed 05 May 05 2018

  19. Maydanchik, A.: Data Quality Assessment. Technics Publications, Denville (2007)

    Google Scholar 

  20. Naumann, F.: Data profiling revisited. ACM SIGMOD Rec. 42(4), 40–49 (2014)

    Article  Google Scholar 

  21. Open Data Charter: Open data charter web site. https://opendatacharter.net. Accessed 23 May 2018

  22. Open Knowledge International: Open definition (2018). https://opendefinition.org/od/2.1/en/. Accessed 05 May 2018

  23. Pirozzi, D., Scarano, V.: Support citizens in visualising open data. In: 20th International Conference on Information Visualisation (IV), pp. 271–276. IEEE (2016)

    Google Scholar 

Download references

Acknowledgements

The research leading to results presented in this paper has been conducted in the project ROUTE-TO-PA (www.routetopa.eu) that received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 645860. We gratefully acknowledge discussions with the project participants, who stimulated our work. Authors would like to thanks the anonymous reviewers for the interesting and valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donato Pirozzi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pirozzi, D., Scarano, V. (2019). Syntactical Heuristics for the Open Data Quality Assessment and Their Applications. In: Abramowicz, W., Paschke, A. (eds) Business Information Systems Workshops. BIS 2018. Lecture Notes in Business Information Processing, vol 339. Springer, Cham. https://doi.org/10.1007/978-3-030-04849-5_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04849-5_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04848-8

  • Online ISBN: 978-3-030-04849-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics