Advertisement

Feature-Based Entity Matching: The FBEM Model, Implementation, Evaluation

  • Heiko Stoermer
  • Nataliya Rassadko
  • Nachiket Vaidya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6051)

Abstract

Entity matching or resolution is at the heart of many integration tasks in modern information systems. As with any core functionality, good quality of results is vital to ensure that upper-level tasks perform as desired. In this paper we introduce the FBEM algorithm and illustrate its usefulness for general-purpose use cases. We analyze its result quality with a range of experiments on heterogeneous data sources, and show that the approach provides good results for entities of different types, such as persons, organizations or publications, while posing minimal requirements to input data formats and requiring no training.

Keywords

Entity resolution record linkage information integration 

References

  1. 1.
    Bazzanella, B., Chaudhry, J.A., Palpanas, T., Stoermer, H.: Towards a General Entity Representation Model. In: Proceedings of the 5th Workshop on Semantic Web Applications and Perspectives (SWAP 2008), Rome, Italy (December 2008)Google Scholar
  2. 2.
    Bazzanella, B., Stoermer, H., Bouquet, P.: Top Level Categories and Attributes for Entity Representation. Technical Report 1, University of Trento, Scienze della Cognizione e della Formazione (September 2008)Google Scholar
  3. 3.
    Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widomr, J., Jonas, J.: Swoosh: A Generic Approach to Entity Resolution. Technical report, Stanford InfoLab (2006)Google Scholar
  4. 4.
    Bouquet, P., Stoermer, H., Niederee, C., Mana, A.: Entity Name System: The Backbone of an Open and Scalable Web of Data. In: Proceedings of the IEEE International Conference on Semantic Computing, ICSC 2008, August 2008, pp. 554–561. IEEE Computer Society, Los Alamitos (2008), CSS-ICSC 2008-4-28-25CrossRefGoogle Scholar
  5. 5.
    Brizan, D.G., Tansel, A.U.: A Survey of Entity Resolution and Record Linkage Methodologies. Communications of the IIMA 6(3), 41–50 (2006)Google Scholar
  6. 6.
    Camacho, H., Salhi, A.: A string metric based on a one to one greedy matching algorithm. In: Research in Computer Science number, pp. 171–182 (2006)Google Scholar
  7. 7.
    Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of the IJCAI 2003 Workshop IIWeb, Acapulco, México, August 9-10, pp. 73–78 (2003)Google Scholar
  8. 8.
    Dong, X., Halevy, A., Madhavan, J.: Reference Reconciliation in Complex Information Spaces. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 85–96. ACM Press, New York (2005)CrossRefGoogle Scholar
  9. 9.
    Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)CrossRefGoogle Scholar
  10. 10.
    Euzenat, J.: An api for ontology alignment. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 698–712. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Garcia-Molina, H.: Pair-wise entity resolution: overview and challenges. In: Yu, P.S., Tsotras, V.J., Fox, E.A., Liu, B. (eds.) Proceedings CIKM 2006, Arlington, Virginia, USA, November 6-11, p. 1. ACM, New York (2006)Google Scholar
  12. 12.
    Hernández, M.A., Stolfo, S.J.: The merge/purge problem for large databases. SIGMOD Rec. 24(2), 127–138 (1995)CrossRefGoogle Scholar
  13. 13.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  14. 14.
    Monge, A.E., Elkan, C.: An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In: DMKD (1997)Google Scholar
  15. 15.
    Noy, N.F.: Semantic Integration: a Survey of Ontology-based Approaches. SIGMOD Rec. 33(4), 65–70 (2004)CrossRefGoogle Scholar
  16. 16.
    Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB Journal: Very Large Data Bases 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  17. 17.
    Stoermer, H., Bouquet, P.: A Novel Approach for Entity Linkage. In: Zhang, K., Alhajj, R. (eds.) Proceedings of IRI 2009, the 10th IEEE Internationational Conference on Information Reuse and Integration, Las Vegas, USA, August 10-12. IRI, vol. 10, pp. 151–156. IEEE Systems, Man and Cybernetics Society (2009)Google Scholar
  18. 18.
    Tejada, S., Knoblock, C.A., Minton, S.: Learning object identification rules for information integration. Inf. Syst. 26(8), 607–633 (2001)zbMATHCrossRefGoogle Scholar
  19. 19.
    Winkler, W.E.: The State of Record Linkage and Current Research Problems. Technical report, Statistical Research Division, U.S. Census Bureau, Washington, DC (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Heiko Stoermer
    • 1
  • Nataliya Rassadko
    • 1
  • Nachiket Vaidya
    • 1
  1. 1.Dept. of Information and Communication Tech.University of TrentoTrentoItaly

Personalised recommendations