Determining the Validity of Clustering for Data Fusion
For many direct marketing activities, organisations frequently find that customer databases do not contain enough information. Additional databases such as socio-economic databases constructed from census and survey data can be purchased to supplement customer databases. One of the difficulties in fusing separate databases however is that the information is based on two different samples and rarely can a unique individual be identified in both samples. Usually a common set of variables are used to determine the similarities between customers in the two samples, and various methods have been proposed for then predicting the missing information from one sample based on the information contained in the other sample. While some complicated methods have been proposed for data fusion, in this paper we investigate the validity of a simple clustering approach. Using a set of variables common to both samples, clusters are generated based on the k-means algorithm. The likely values of missing variables are then inferred based on the average values within the relevant cluster. An out-of-sample test set is used to demonstrate the accuracy of the fused variable predictions.
KeywordsCommon Variable Data Fusion Mean Absolute Percentage Error Absolute Percentage Error Donor Sample
Unable to display preview. Download preview PDF.
- Shepard, D., The New Direct Marketing, Irwin, 1996.Google Scholar
- Jephcott, J. and Bock, T., “The application and validation of data fusion”, Journal of the Market Research Society, vol. 40, no. 3, pp. 185–205, 1998.Google Scholar
- Radner, D. B., Rich, A., Gonzales, M. E., Jabine, T. B. and Muller, H. J., “Report on exact and statistical matching techniques”, Statistical Working Paper 5, Office of Federal Statistical Policy and Standards, US DoC, 1980.Google Scholar
- Richardson, C., “Data fusion does not work”, Journal of the Market Research Society, vol. 32, no. 3, pp. 472–473, 1996.Google Scholar
- Baker, K., Harris, P., and O’Brien, J., “Data fusion: an appraisal and experimental evaluation”, Journal of the Market Research Society, vol. 39, no. 1, pp. 225–271, 1989.Google Scholar
- van der Putten, P., “Data Fusion: A Way to Provide More Data to Mine in?”, Proceedings 12 11 ’ Belgian-Dutch Artificial Intelligence Conference BNAIC’2000, De Efteling, Kaatsheuvel, The Netherlands, 2000.Google Scholar
- Kohonen, T., Self-organisation and associative memory, Springer-Verlag, Berlin, 1990.Google Scholar
- Eudaptics (1999). Viscovery SOMine 3.0 User Manual, www.eudaptics.com.