Advertisement

NokeaRM: Employing Non-key Attributes in Record Matching

  • Qiang Yang
  • Zhixu LiEmail author
  • Jun Jiang
  • Pengpeng Zhao
  • Guanfeng Liu
  • An Liu
  • Jia Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)

Abstract

Record Matching (RM) aims at finding out pairs of instances referring to the same entity between relational tables. Existing RM methods mainly work on key attribute values, but neglect the possible effectiveness of non-key attribute values in RM. As a result, when two instances referring to the same entity do not have similar key attribute values, they are unlikely to be linked as an instance pair. On the other hand, the two instances may share some important non-key attribute values which can also help us identify the relationship between them. With this intuition, we propose to employ non-key attributes in RM. Basically, we propose a rule-based algorithm based on a tree-like structure, which can not only deal with noisy and missing values, but also greatly improve the efficiency of the method by finding out matched instances or filtering unmatched instances as early as possible. The experimental results based on several data sets demonstrate that our method outperforms existing RM methods by reaching a higher precision and recall. Besides, the proposed techniques can greatly improve the efficiency of a baseline.

Keywords

Record matching Non-key attribute Algorithm 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cheatham, M., Hitzler, P.: String similarity metrics for ontology alignment. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 294–309. Springer, Heidelberg (2013)Google Scholar
  2. 2.
    Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92(1), 191–211 (1992)zbMATHMathSciNetCrossRefGoogle Scholar
  3. 3.
    Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Information Sciences 126(1), 83–98 (2000)zbMATHCrossRefGoogle Scholar
  4. 4.
    Wang, W., Xiao, C., Lin, X., Zhang, C.: Efficient approximate entity extraction with edit distance constraints. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 759–770. ACM (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Qiang Yang
    • 1
  • Zhixu Li
    • 1
    Email author
  • Jun Jiang
    • 1
  • Pengpeng Zhao
    • 1
  • Guanfeng Liu
    • 1
  • An Liu
    • 1
  • Jia Zhu
    • 2
  1. 1.School of Computer Science and TechnologySoochow UniversitySuzhouChina
  2. 2.School of Computer ScienceSouth China Normal UniversityGuangzhouChina

Personalised recommendations