Comments on: Data science, big data and statistics
- 19 Downloads
We praise Professors Galeano and Peña for this paper and for sharing their view on the impact of Big Data on Statistics and the emerging field of Data Science. They draw attention to seven main points which are very interesting and relevant, and present two interesting applications. We will focus our discussion on two of these topics, which are most related to our own expertise: heterogeneous data (including data quality and robustness) and automatic model selection.
Modern big data are often the result of administrative/operational data collection as it is the case for the two examples given by the authors. In a recent discussion paper, Professor David Hand (2018a, b) points out that this type of data also exhibits quality issues. Hence, there is a need for robust methods to automatically analyze such data. As pointed out by the authors, the standard statistical efficiency concept looses its meaning because the data form the whole population. On the other hand,...
Mathematics Subject Classification62H99
Funding was provided by Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada.
- Christidis A, Lakshmanan LVS, Smucler E, Zamar R (2017) Ensembles of regularized linear models. arXiv:1712.03561
- Hand DJ (2018b) Hand writing: administrative data. IMS Bull 47(6):8–9Google Scholar
- Wang Y, Van Aelst S (2017) Robust variable screening for regression using factor profiling. Stat Anal Data Min (to appear). arXiv:1711.09586