Introduction to Data Management Using data.table
We already briefly introduced the data.table package. This package is the heart of this chapter, which covers the basics of accessing, editing, and manipulating data under the broad term data management. Although not glamorous, data management is a critical first step to data visualization or analysis. Furthermore, the majority of time on a particular analysis project may come from the data management. For example, running a linear model in R can take one line of code, once the data is clean and in the format that the lm() function in R expects. Data management can be challenging, because raw data come in all types, shapes, and formats; missing data is common; and you may also have to combine or merge separate data sources. In this chapter, we introduce both mechanical and philosophical techniques to approach data management. All packages used in this chapter are already in our checkpoint.R file. Thus you need only source the file to get started.