Using Data Fusion to Enrich Customer Databases with Survey Data for Database Marketing

  • Peter van der Putten
  • Joost N. Kok
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 258)


Many data mining papers start with claiming that the exponential growth in the amount of data provides great opportunities for data mining. Reality can be different though. In real world applications, the number of sources over which this information is fragmented can grow at an even faster rate, resulting in barriers to widespread application of data mining and missed business opportunities. Let us illustrate this paradox with a motivating example from database marketing.


Data Mining Credit Card Data Fusion External Evaluation Data Mining Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baker, K., Harris, P., O’Brien, J.: Data fusion: An appraisal and experimental evaluation. Journal of the Market Research Society 31(2), 152–212 (1989)Google Scholar
  2. 2.
    Barr, R., Turner, J.: A new, linear programming approach to microdata file merging. In: 1978 Compendium of Tax Research. Office of Tax Analysis (1978)Google Scholar
  3. 3.
    Budd, E.: The creation of a microdata file for estimating the size distribution of income. Review of Income and Wealth 17, 317–333 (1971)CrossRefGoogle Scholar
  4. 4.
    Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., Wirth, R.: The crisp-dm process model. Tech. rep., Crisp Consortium (1999),
  5. 5.
    D’Orazio, M., Di Zio, M., Scanu, M.: Statistical Matching: Theory and Practice. Wiley, Chichester (2006)zbMATHCrossRefGoogle Scholar
  6. 6.
    Flores, G.A., Albacea, E.A.: A genetic algorithm for constrained statistical matching. In: 10th National Convention on Statistics (NCS), Manila, Phillipines (2007)Google Scholar
  7. 7.
    Gusfield, D., Irving, R.W.: The stable marriage problem: structure and algorithms. MIT Press, Cambridge (1989)zbMATHGoogle Scholar
  8. 8.
    Jephcott, J., Bock, T.: The application and validation of data fusion. Journal of the Market Research Society 40(3), 185–205 (1998)Google Scholar
  9. 9.
    Kamakura, W., Wedel, M.: Statistical data fusion for cross-tabulation. Journal of Marketing Research 34(4), 485–498 (1997)CrossRefGoogle Scholar
  10. 10.
    Kum, H., Masterson, T.: Statistical matching using propensity scores: Theory and application to the levy institute measure of economic well-being. Working paper no. 535, The Levy Economics Institute of Bard College (2008)Google Scholar
  11. 11.
    Little, R., Rubin, D.: Statistical analysis with missing data. John Wiley and Sons, Chichester (1986)Google Scholar
  12. 12.
    Maat, B.: The need for fusing head and neck cancer data. can more data provide a better data mining model for predicting survivability of head and neck cancer patients? Master’s thesis, ICT in Business, Leiden Institute of Advanced Computer Science. Leiden University, The Netherlands (2006)Google Scholar
  13. 13.
    Moller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6(4), 525–533 (1993)CrossRefGoogle Scholar
  14. 14.
    Nguyen, D.H., Widrow, B.: Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: IJCNN International Joint Conference on Neural Networks, vol. 3, pp. 21–26 (1990)Google Scholar
  15. 15.
    O’Brien, S.: The role of data fusion in actionable media targeting in the 1990’s. Marketing and Research Today 19, 15–22 (1991)Google Scholar
  16. 16.
    Paass, G.: Statistical match: Evaluation of existing procedures and improvements by using additional information. In: Orcutt, G., Merz, K. (eds.) Microanalytic Simulation Models to Support Social and Financial Policy, pp. 401–422. Elsevier Science, Amsterdam (1986)Google Scholar
  17. 17.
    Pei, J., Getoor, L., de Keijzer, A. (eds.): First ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data, Paris, France, June 28. ACM, New York (2009)Google Scholar
  18. 18.
    van Pelt, X.: The fusion factory: A constrained data fusion approach. Master’s thesis, Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands (2001)Google Scholar
  19. 19.
    van der Putten, P.: Utilizing the topology preserving property of self-organizing maps for classification. Master’s thesis, Cognitive Artificial Intelligence, Utrecht University, The Netherlands (1996)Google Scholar
  20. 20.
    van der Putten, P.: Data mining in direct marketing databases. In: Baets, W. (ed.) Complexity and Management: A Collection of Essays, World Scientific Publishers, Singapore (1999)Google Scholar
  21. 21.
    van der Putten, P.: Data fusion: A way to provide more data to mine in? In: Proceedings 12th Belgian-Dutch Artificial Intelligence (2000)Google Scholar
  22. 22.
    van der Putten, P.: Data fusion for data mining: a problem statement. In: Coil Seminar 2000, Chios, Greece, June 22-23 (2000)Google Scholar
  23. 23.
    van der Putten, P., Kok, J.N., Gupta, A.: Data fusion through statistical matching. Tech. Rep. Working Paper No. 4342-02, MIT Sloan School of Management, Cambridge, MA (2002)Google Scholar
  24. 24.
    van der Putten, P., Kok, J.N., Gupta, A.: Why the information explosion can be bad for data mining, and how data fusion provides a way out. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) SDM, SIAM, Philadelphia (2002)Google Scholar
  25. 25.
    van der Putten, P., Ramaekers, M., den Uyl, M., Kok, J.N.: A process model for a data fusion factory. In: Proceedings of the 14th Belgium/Netherlands Conference on Artificial Intelligence (BNAIC 2002), Leuven, Belgium (2002)Google Scholar
  26. 26.
    van der Putten, P., van Someren, M.: A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000. Machine Learning 57(1-2), 177–195 (2004)zbMATHCrossRefGoogle Scholar
  27. 27.
    Radner, D., Rich, A., Gonzalez, M., Jabine, T., Muller, H.: Report on exact and statistical matching techniques. statistical working paper 5. Tech. rep., Office of Federal Statistical Policy and Standards US DoC (1980)Google Scholar
  28. 28.
    Raessler, S.: Statistical Matching: A Frequentist Theory, Practical Applications, and Alternative Bayesian Approaches. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  29. 29.
    Roberts, A.: Media exposure and consumer purchasing: An improved data fusion technique. Marketing And Research Today 22, 159–172 (1994)Google Scholar
  30. 30.
    Rodgers, W.L.: An evaluation of statistical matching. Journal of Business & Economic Statistics 2(1), 91–102 (1984)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Rubin, D.B.: Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business & Economic Statistics 4(1), 87–94 (1986)CrossRefGoogle Scholar
  32. 32.
    Ruggles, N., Ruggles, R.: A strategy for merging and matching microdata sets. Annals Of Social And Economic Measurement 3(2), 353–371 (1974)Google Scholar
  33. 33.
    de Ruiter, M.: Bayesian classification in data mining: theory and practice. Master’s thesis, BWI, Free University of Amsterdam, The Netherlands (1999)Google Scholar
  34. 34.
    Smith, K.A., Chuan, S., van der Putten, P.: Determining the validity of clustering for data fusion. In: Proceedings of Hybrid Information Systems, Adelaide, Australia, December 11-12 (2001)Google Scholar
  35. 35.
    Soong, R., de Montigny, M.: Does fusion-on-the-fly really fly? In: ARF/ESOMAR Week of Audience Measurement (2003)Google Scholar
  36. 36.
    Soong, R., de Montigny, M.: No free lunch in data integration. In: ARF/ESOMAR Week of Audience Measurement (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Peter van der Putten
    • 1
  • Joost N. Kok
    • 1
  1. 1.LIACSLeiden UniversityLeidenThe Netherlands

Personalised recommendations