Classification and Fusion

  • Hans-Joachim Lenz
  • Mattis Neiling
Part of the International Centre for Mechanical Sciences book series (CISM, volume 431)


We consider data fusion in the case of missing object identification. As a simple example think of fusion of partial overlapping address files of customers extracted from autonomous sites or of an administrative record census. The first example is related to customer relationship management (CRM), while the last one is a substitute of a regular census. This kind of data fusion causes problems of (schema) integration, solving semantic conflicts, and object identification if global identifiers are not locally available and local heterogeneous, autonomous databases are to be accessed. The complexity of the problem is increased by the existence of errors like input or loading errors, mispellings, missing values, and, of course, duplicated entries. We develop a unified framework for such kind of data fusion. We cover the feature selection problem, and embed the data fusion problem into a supervised classification problem. For each pair of records we have to decide whether a definite decision upon matching or not is possible and if it is possible, whether the two records are linked to an identical unit (customer, citizen etc.) or not. Candidates for classification can be selected from likelihood ratio tests (record linkage), classification trees, non linear classification or state vector machines. We illustrate our approach by a running example.


Association Rule Database System Object Identification Data Fusion Customer Relationship Management 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Peter Buneman and Sushil Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26–28, 1993, pages 207216. ACM Press, 1993.Google Scholar
  2. Wendy Alvey and Bettye Jamerson, editors. Record Linkage Techniques —1997. Proceedings of an International Workshop and Exposition. March 20–21, 1997 in Arlington, Virginia, Washington, DC, 1997. Federal Committee on Statistical Methodology, Office of Management and Budget.Google Scholar
  3. Christof Bomhóvd and Alejandro P. Buchmann. A prototype for metadata-based integration of internet sources. In Advanced Information Systems Engineering (CAiSE ‘89), pages 439–445. Springer-Verlag, 1999.Google Scholar
  4. L. Breiman, J. Friedman, R Olshen, and C. Stone. Classification and regression trees. Chapman & Hall, 1984.Google Scholar
  5. Michael J. A. Berry and Gordon Linoff. Data mining techniques: for marketing, sales, and customer support. John Wiley & Sons, New York, 1997.Google Scholar
  6. Ivan P. Fellegi and Alan B. Sunter. A theory of record linkage. Journal of the American Statistical Association, 64: 1183–1210, 1969.CrossRefGoogle Scholar
  7. Beth Kilss and Wendy Alvey, editors. Record Linkage Techniques —1985. Proceedings of the Workshop on Exakt Matching Methodologies in Arlington, Virginia May 9–10, 1985, Internal Revenue Service Publication, Washington, DC, 1985. Department of the Treasury, Statistics of Income Division.Google Scholar
  8. Alexander McFarlane Mood, Franklin A Graybill, and Duane C. Boes. Introduction to the Theory of Statistics. McGraw-Hill series in probability and statistics. McGraw-Hill, Tokyo, 1974.Google Scholar
  9. Donald Michic, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. Horwood, New York, 1994.Google Scholar
  10. Mattis Neiling. Data Fusion with Record Linkage. In 3. Workshop “Föderierte Datenbanken” Magdeburg 1998, 1998.Google Scholar
  11. Mattis Neiling. Datenintegration durch Objekt-Identifikation. In Ralf-Detlef Kutsche, Ulf Leser, and Johann Christoph Freytag, editors, 4. Workshop Föderierte Datenbanken Berlin, Germany, 25.-26. November 1999, pages 117–143, 1999.Google Scholar
  12. Mattis Neiling. Datenintegration durch Objekt-Identifikation: Die Zusammenfiihrung von Datenquellen, die keine gemeinsamen Identifizierer enthalten. In 4. Workshop “Föderierte Datenbanken” Berlin 1999, 1999.Google Scholar
  13. Mattis Neiling and Hans-Joachim Lenz. The creation of register based census for germany in 2001. An application of data integration. discussion paper 1999/34, Fachbereich Wirtschaftswissenschaft der Freien Universität Berlin, 1999.Google Scholar
  14. Mattis Neiling and Hans-Joachim Lenz. Data integration by means of object identification in information systems. In Hans Robert Hansen et al., editor, Proceedings of the 8th European Conference on Information Systems (ECIS 2000), Vienna, Austria, July 2000, 2000.Google Scholar
  15. Mattis Neiling and Hans-Joachim Lenz. Supplement of information: Data integration by classification of pairs of records. In 24th Annual Conference of the Gesellschaft far Klass, kation, Passau, Germany, March 15–17, 2000, 2000. to appear.Google Scholar
  16. J. R. Quinlan. Q4.5: Programs for Machine Learning. Morgan Kaufmann, 1991Google Scholar
  17. Gio Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25 (3): 3849, 1992.Google Scholar
  18. William E. Winkler. Matching and record linkage. In B. G. Cox, editor, Business Survey Methods, pages 355–384. J. Wiley, New York, 1995.Google Scholar

Copyright information

© Springer-Verlag Wien 2001

Authors and Affiliations

  • Hans-Joachim Lenz
    • 1
  • Mattis Neiling
    • 1
  1. 1.Department of Economics, Institute for Information SystemsFree University of BerlinGermany

Personalised recommendations