A Flexible Fuzzy Expert System for Fuzzy Duplicate Elimination in Data Cleaning

  • Hamid Haidarian Shahri
  • Ahmad Abdollahzadeh Barforush
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3180)


Data cleaning deals with the detection and removal of errors and inconsistencies in data, gathered from distributed sources. This process is essential for drawing correct conclusions from data in decision support systems. Eliminating fuzzy duplicate records is a fundamental part of the data cleaning process. The vagueness and uncertainty involved in detecting fuzzy duplicates make it a niche, for applying fuzzy reasoning. Although uncertainty alg ebras like fuzzy logic are known, their applicability to the problem of duplicate elimination has remained unexplored and unclear, until today. In this paper, a novel and flexible fuzzy expert system for detection and elimination of fuzzy duplicates in the process of data cleaning is devised, which circumvents the repetitive and inconvenient task of hard-coding. Some of the crucial advantages of this approach are its flexibility, ease of use, extendibility, fast development time and efficient run time, when used in various information systems.


Membership Function Fuzzy Rule Linguistic Term Fuzzy Subset Inference Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC (August 2003)Google Scholar
  2. 2.
    Cohen, W., Ravikumar, P., Fienberg, S.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: IIWeb Workshop 2003 (2003)Google Scholar
  3. 3.
    Galhardas, H., Florescu, D., et al.: Declarative Data Cleaning: Language, Model, and Algorithms. In: Proc. of the 27th VLDB Conference (2001)Google Scholar
  4. 4.
    Haidarian Shahri, H., Barforush, A.A.: Data Mining for Removing Fuzzy Duplicates Using Fuzzy Inference. In: 23rd International Conference of the North American Fuzzy Information Processing Society (NAFIPS 2004), Banff, Alberta, Canada, June 27-30 (2004)Google Scholar
  5. 5.
    Hernandez, M.A., Stolfo, S.J.: Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery 2(1), 9–37 (1998)CrossRefGoogle Scholar
  6. 6.
    Low, W.L., Lee, M.L., Ling, T.W.: A Knowledge-based Approach for Duplicate Elimination in Data Cleaning. Information Systems 26, 585–606 (2001)zbMATHCrossRefGoogle Scholar
  7. 7.
    Mamdani, E.H.: Advances in Linguistic Synthesis of Fuzzy Controllers. Int. J. Man Machine Studies 8 (1976)Google Scholar
  8. 8.
    Monge, A.E., Elkan, P.C.: An Efficient Domain-independent Algorithm for Detecting Approximately Duplicate Database Records. In: Proceedings of the SIGMOD 1997 Workshop on Data Mining and Knowledge Discovery (May 1997)Google Scholar
  9. 9.
    Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Special Issue on Data Cleaning 23(4) (December 2000)Google Scholar
  10. 10.
    Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proc. of the 27th VLDB Conference (2001)Google Scholar
  11. 11.
    Winkler, W.E.: The State of Record Linkage and Current Research Problems. Statistics of Income Division, Internal Revenue Service Publication R99/04 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Hamid Haidarian Shahri
    • 1
  • Ahmad Abdollahzadeh Barforush
    • 1
  1. 1.Faculty of Computer Engineering and Information TechnologyAmirkabir University of Technology (Tehran Polytechnic)TehranIran

Personalised recommendations