Abstract
Synonymous to Samuel Taylor Coleridge’s quote in Rime of the Ancient Mariner, the degree to which data are useful is largely determined by an analyst’s ability to wrangle data. In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time obtaining data, diagnosing data quality issues and pre-processing data into a usable form. Research has illustrated that this portion of the data analysis process is the most tedious and time consuming component; often consuming 50–80 % of an analyst’s time (cf. Wickham 2014; Dasu and Johnson 2003). Despite the challenges, data wrangling remains a fundamental building block that enables visualization and statistical modeling. Only through data wrangling can we make data useful. Consequently, one’s ability to perform data wrangling tasks effectively and efficiently is fundamental to becoming an expert data analyst in their respective domain.
Water, water, everywhere, nor any a drop to drink
Samuel Taylor Coleridge
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Dasu, T., & Johnson, T. (2003). Exploratory Data Mining and Data Cleaning (Vol. 479). John Wiley & Sons.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey.
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59 (i10).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Boehmke, B.C. (2016). The Role of Data Wrangling. In: Data Wrangling with R. Use R!. Springer, Cham. https://doi.org/10.1007/978-3-319-45599-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-45599-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45598-3
Online ISBN: 978-3-319-45599-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)