Abstract
This paper proposes an approach for using visual data profiling in tabular data cleaning and transformation processes. Visual data profiling is the statistical assessment of datasets to identify and visualize potential quality issues. The proposed approach was implemented in a software prototype and empirically validated in a usability study to determine to what extent visual data profiling is useful and how easy it is to use by data scientists. The study involved 24 users in a comparative usability test and 4 expert reviewers in cognitive walkthroughs. The evaluation results show that users find visual data profiling capabilities to be useful and easy to use in the process of data cleaning and transformation.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Hellerstein, J.M.: Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE), February 2008
Kandel, S., Parikh, R., Paepcke, A., Hellerstein, J.M., Heer, J.: Profiler: integrated statistical analysis and visualization for data quality assessment. In: Proceedings of the International Working Conference on Advanced Visual Interfaces, New York, NY, USA, pp. 547–554 (2012)
Redman, T.C.: Bad Data Costs the U.S. $3 Trillion Per Year. Harvard Business Review, 22 September 2016. https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year. Accessed 18 Mar 2017
CrowdFlower|2016 Data Science Report. https://visit.crowdflower.com/data-science-report. Accessed 19 Mar 2017
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts And Techniques. Elsevier, Amsterdam (2011)
Sukhobok, D., et al.: Tabular Data Cleaning and Linked Data Generation with Grafterizer. ESWC (Satellite Events), pp. 134–139 (2016)
Sukhobok, D., Nikolov, N., Roman, D.: Tabular data anomaly patterns. In: Proceedings of the 3rd International Conference on Big Data Innovations and Applications (Innovate-Data 2017), 21–23 August 2017, to appear
Roman, D., et al.: DataGraft: One-Stop-Shop for Open Data Management. In: The Semantic Web Journal (SWJ) – Interoperability, Usability, Applicability. IOS Press (2017, to appear). ISSN 1570-0844
Roman, D., et al.: Datagraft: Simplifying open data publishing. In: ESWC (Satellite Events), pp. 101–106 (2016)
Roman, D., et al.: DataGraft: a platform for open data publishing. In: Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop. (LIME/SemDev@ESWC 2016) (2016)
Stolte, C., Tang, D., Hanrahan, P.: Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Trans. Visual Comput. Graphics 8(1), 52–65 (2002)
Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 3363–3372 (2011)
Mutlu, B., Veas, E., Trattner, C., Sabol, V.: VizRec: a two-stage recommender system for personalized visualizations. In: Proceedings of the 20th International Conference on Intelligent User Interfaces Companion, New York, NY, USA, pp. 49–52 (2015)
Voigt, M., Franke, M., Meissner, K.: Using expert and empirical knowledge for context-aware recommendation of visualization components. Int. J. Adv. Life Sci 5, 27–41 (2013)
Mutlu, B., Veas, E., Trattner, C., Sabol, V.: Towards a recommender engine for personalized visualizations. In: International Conference on User Modeling, Adaptation, and Personalization, pp. 169–182 (2015)
Wongsuphasawat, K., Moritz, D., Anand, A., Mackinlay, J., Howe, B., Heer, J.: Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans. Visual Comput. Graphics 22(1), 649–658 (2016)
Vega-Lite. https://vega.github.io/vega-lite/. Accessed 19 Mar 2017
Wilkinson, L.: The Grammar of Graphics. Springer Science & Business Media, New York (2006)
Wickham, H.: ggplot2: Elegant Graphics for Data Analysis. Springer, New York (2016)
Mackinlay, J., Hanrahan, P., Stolte, C.: Show me: automatic presentation for visual analysis. IEEE Trans. Visual Comput. Graphics 13(6), 1137–1144 (2007)
Satyanarayan, A., Russell, R., Hoffswell, J., Heer, J.: Reactive vega: a streaming dataflow architecture for declarative interactive visualization. IEEE Trans. Visual Comput. Graphics 22(1), 659–668 (2016)
Bakke, E., Karger, D.R.: Expressive query construction through direct manipulation of nested relational results. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1377–1392 (2016)
The Guide to Prototyping Process & Fidelity. Studio by UXPin. https://www.uxpin.com/studio/ebooks/prototyping-process-fidelity-guide/. Accessed 13 Apr 2017
Heer, J., Hellerstein, J.M., Kandel, S.: Predictive interaction for data transformation. In: CIDR (2015)
Chen, S.: Six Core Data Wrangling Activities eBook. Trifacta, 23 November 2015
Hanington, B., Martin, B.: Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions. Rockport Publishers, Gloucester (2012)
The ultimate guide to prototyping. Studio by UXPin. https://www.uxpin.com/studio/ebooks/guide-to-prototyping/. Accessed 13 Apr 2017
Familiar, B.: Microservices, IoT and Azure: Leveraging DevOps and Microservice Architecture to deliver SaaS Solutions. Apress, New York (2015)
Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 13, 319–340 (1989)
Barnum, C.M.: Usability Testing Essentials: ready, set… Test! Elsevier, Amsterdam (2010)
Sauro, J., Lewis, J.R.: Quantifying the User Experience: Practical Statistics for User Research. Morgan Kaufmann, Burlington (2016)
Nielsen, J.: Usability inspection methods. In: Conference Companion on Human Factors in Computing Systems, pp. 413–414 (1994)
Spencer, R.: The streamlined cognitive walkthrough method, working around social constraints encountered in a software development company, pp. 353–359 (2000)
Mahatody, T., Sagar, M., Kolski, C.: State of the art on the cognitive walkthrough method, its variants and evolutions. Intl. J. Hum.-Comput. Interact. 26(8), 741–785 (2010)
Cognitive Walkthrough|Usability Body of Knowledge. http://www.usabilitybok.org/cognitive-walkthrough. Accessed 10 May 2017
Acknowledgements
The work in this paper is partly supported by the EC funded projects proDataMarket (Grant number: 644497), euBusinessGraph (Grant number: 732003), and EW-Shopp (Grant number: 732590).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
von Zernichow, B.M., Roman, D. (2017). Usability of Visual Data Profiling in Data Cleaning and Transformation. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10574. Springer, Cham. https://doi.org/10.1007/978-3-319-69459-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-69459-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69458-0
Online ISBN: 978-3-319-69459-7
eBook Packages: Computer ScienceComputer Science (R0)