Abstract
The databases available for model construction can be vast. Some customer databases contain tens of millions of records (observations) and thousands of predictor variables. Even with modern computing facilities it may not be practical to use all of the data available. There will also be cases that are not suitable for model construction, and these need to be identified and dealt with. The data used for model construction should also be as similar as possible to the data that will exist when the completed model is put into service — which usually means that the sample used to construct the model should be as recent as possible to mitigate against changes in the patterns of behaviour that accumulate over time. For these reasons it is common for models to be constructed using a sub-set (a sample) of the available data, rather than the full population.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Copyright information
© 2010 Steven Finlay
About this chapter
Cite this chapter
Finlay, S. (2010). Sample Selection. In: Credit Scoring, Response Modelling and Insurance Rating. Palgrave Macmillan, London. https://doi.org/10.1057/9780230298989_3
Download citation
DOI: https://doi.org/10.1057/9780230298989_3
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-36689-7
Online ISBN: 978-0-230-29898-9
eBook Packages: Palgrave Economics & Finance CollectionEconomics and Finance (R0)