On the Trade-Off Between Number of Examples and Precision of Supervision in Regression
We investigate regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time, we optimize the trade-off between the number of examples and their precision, by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the ordinary least squares algorithm. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples.
KeywordsOrdinary least squares Large-sample approximation Variance control
- 4.Ruud, P.A.: An Introduction to Classical Econometric Theory, 1st edn. Oxford University Press, Oxford (2000)Google Scholar