Abstract
Open Government Data are valuable initiatives in favour of transparency, accountability, and openness. The expectation is to increase participation by engaging citizens, non-profit organisations, and companies in reusing Open Data (OD). A potential barrier in the exploitation of OD and engagement of the target audience is the low quality of available datasets [3, 14, 16]. Non-technical consumers are often unaware that data could have potential quality issues, taking for grant that datasets can be used immediately without any further manipulation. In reality, in order to reuse data, for instance to create visualisations, they need to perform a data clean, which requires time, resources, and proper skills. This leads to a reduced chance to involve citizens.
This paper tackles the quality barrier of raw tabular datasets (i.e. CSV), a popular format (Tim-Berners Lee tree-stars) for Governmental Open Data. The objective is to increase awareness and provide support in data cleaning operations to both PAs to produce better quality Open Data and non-technical data consumers to reuse datasets. DataChecker is an open source and modular JavaScript library shared with community and available on GitHub that takes in input a tabular dataset and generate a machine-readable report based on the data type inferencing (a data profiling technique). Based on it the Social Platform for Open Data (SPOD) provides quality cleaning suggestions to both PAs and end-users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Messytables documentation https://messytables.readthedocs.io/en/latest.
- 2.
DataChecker open source library available on GitHub at https://github.com/donpir/JSDataChecker.
References
Ambrosino, M.A., et al.: Protection and preservation of campania cultural heritage engaging local communities via the use of open data. In: Proceedings of the 19th International Conference on Digital Government Research. ACM (2018). https://doi.org/10.1145/3209281.3209347
Andriessen, J., et al.: Increasing public value through co-creation of open knowledge. In: 2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), pp. 47–54. IEEE (2017)
Beno, M., Figl, K., Umbrich, J., Polleres, A.: Open data hopes and fears: determining the barriers of open data. In: 2017 Conference for E-Democracy and Open Government (CeDEM), pp. 69–81. IEEE (2017)
Berners-Lee, T.: Linked data - design issues. http://www.w3.org/Designlssues/LinkedData.html. Accessed 03 May 2018
Castro, D., Korte, T.: Open data in the G8: a review of progress on the open data charter (2015). Accessed 23 May 2018
Commission, E.: Open data maturity in Europe 2017 (2017). https://www.europeandataportal.eu/sites/default/files/edp_landscaping_insight_report_n3_2017.pdf
Commission, E.: Open data portal (2017). https://www.europeandataportal.eu/data/it/dataset
Commission, E.: Re-using open data (2017). https://www.europeandataportal.eu/sites/default/files/re-using_open_data.pdf
Cordasco, G., et al.: Engaging citizens with a social platform for open data. In: Proceedings of the 18th Annual International Conference on Digital Government Research, pp. 242–249. ACM (2017)
Dawes, S.S., Helbig, N.: Information strategies for open government: challenges and prospects for deriving public value from government transparency. In: Wimmer, M.A., Chappelet, J.-L., Janssen, M., Scholl, H.J. (eds.) EGOV 2010. LNCS, vol. 6228, pp. 50–60. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14799-9_5
De Donato, R., et al.: Agile production of high quality open data. In: Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, p. 84. ACM (2018)
De Donato, R., et al.: Datalet-ecosystem provider (deep): scalable architecture for reusable, portable and user-friendly visualizations of open data. In: 2017 Conference for E-Democracy and Open Government (CeDEM), pp. 92–101. IEEE (2017)
Döhmen, T., Mühleisen, H., Boncz, P.: Multi-hypothesis CSV parsing. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 16. ACM (2017)
European Data Portal: Open data goldbook for data manager and data holders. https://www.europeandataportal.eu/sites/default/files/goldbook.pdf. Accessed 23 May 2018
Fish, A., Gargiulo, C., Malandrino, D., Pirozzi, D., Scarano, V.: Visual exploration system in an industrial context. IEEE Trans. Industr. Inf. 12(2), 567–575 (2016)
Foundation TWWW: Open data barometer 4th (edn.) Global Report, May 2017. http://opendatabarometer.org/doc/4thEdition/ODB-4thEdition-GlobalReport.pdf
Helbig, N., Cresswell, A.M., Burke, G.B., Luna-Reyes, L.: The dynamics of opening government data. Center for Technology in Government (2012). http://www.ctg.albany.edu/publications/reports/opendata. Accessed 23 May 2018
International OK: Open data handbook. http://opendatahandbook.org/glossary/. Accessed 05 May 05 2018
Maydanchik, A.: Data Quality Assessment. Technics Publications, Denville (2007)
Naumann, F.: Data profiling revisited. ACM SIGMOD Rec. 42(4), 40–49 (2014)
Open Data Charter: Open data charter web site. https://opendatacharter.net. Accessed 23 May 2018
Open Knowledge International: Open definition (2018). https://opendefinition.org/od/2.1/en/. Accessed 05 May 2018
Pirozzi, D., Scarano, V.: Support citizens in visualising open data. In: 20th International Conference on Information Visualisation (IV), pp. 271–276. IEEE (2016)
Acknowledgements
The research leading to results presented in this paper has been conducted in the project ROUTE-TO-PA (www.routetopa.eu) that received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 645860. We gratefully acknowledge discussions with the project participants, who stimulated our work. Authors would like to thanks the anonymous reviewers for the interesting and valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pirozzi, D., Scarano, V. (2019). Syntactical Heuristics for the Open Data Quality Assessment and Their Applications. In: Abramowicz, W., Paschke, A. (eds) Business Information Systems Workshops. BIS 2018. Lecture Notes in Business Information Processing, vol 339. Springer, Cham. https://doi.org/10.1007/978-3-030-04849-5_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-04849-5_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04848-8
Online ISBN: 978-3-030-04849-5
eBook Packages: Computer ScienceComputer Science (R0)