Advertisement

Multilingual Entity Matching

  • Ilgiz MustafinEmail author
  • Marius-Cristian Frunza
  • JooYoung Lee
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 926)

Abstract

The aim of this paper is to explore methods of multilingual entity matching. Name matching is currently the main technique used for entity resolution. When dealing with entities having features recorded in different languages and with different alphabets the basic approaches have serious limitation. The basic name matching approach using string comparison metrics is enriched with phonetic rules and with relational information. The results show that the approach using transliteration enhanced by phonetic matching provides with the best performance.

References

  1. 1.
    Adafre, S.F., de Rijke, M.: Discovering missing links in Wikipedia. In: Proceedings of the 3rd international workshop on Link discovery, pp. 90–97. ACM (2005)Google Scholar
  2. 2.
    Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random fields and probabilistic soft logic. arXiv preprint arXiv:1505.04406 (2015)
  3. 3.
    Beider, A.: Beider-morse phonetic matching: an alternative to Soundex with fewer false hits. Avotaynu: Int. Rev. Jewish Geneal. (2008)Google Scholar
  4. 4.
    Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 5 (2007)CrossRefGoogle Scholar
  5. 5.
    Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)CrossRefGoogle Scholar
  6. 6.
    Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)Google Scholar
  8. 8.
    Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 85–96. ACM (2005)Google Scholar
  9. 9.
    Goergen, A., Ashida, S., Skapinsky, K., De Heer, H., Wilkinson, A., Koehly, L.: Knowledge is power: improving family health history knowledge of diabetes and heart disease among multigenerational mexican origin families. Public Health Genomics 19(2), 93–101 (2016)CrossRefGoogle Scholar
  10. 10.
    Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774. ACM (2011)Google Scholar
  11. 11.
    Kimmig, A., Bach, S., Broecheler, M., Huang, B., Getoor, L.: A short introduction to probabilistic soft logic. In: Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pp. 1–4 (2012)Google Scholar
  12. 12.
    Kouki, P., Pujara, J., Marcum, C., Koehly, L., Getoor, L.: Collective entity resolution in familial networks. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 227–236. IEEE (2017)Google Scholar
  13. 13.
    Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletion of ones. Probl. Inf. Transm. 1(1), 8–17 (1965)Google Scholar
  14. 14.
    Li, C.W.C.: Foreign names into native tongues: how to transfer sound between languages-transliteration, phonological translation, nativization, and implications for translation theory. Target. Int. J. Transl. Stud. 19(1), 45–68 (2007)Google Scholar
  15. 15.
    Mokotoff, G.: Soundexing and genealogy (2007). http://www.avotaynu.com/soundex.html
  16. 16.
    Moore, G.B.: Accessing Individual Records from Personal Data Files Using Non-unique Identifiers, vol. 13. US Department of Commerce, National Bureau of Standards (1977)Google Scholar
  17. 17.
    Patman, F., Thompson, P.: Names: a new frontier in text mining. In: International Conference on Intelligence and Security Informatics, pp. 27–38. Springer (2003)Google Scholar
  18. 18.
    Peng, T., Li, L., Kennedy, J.: A comparison of techniques for name matching. GSTF J. Comput. (JoC) 2(1), 55–61 (2012)Google Scholar
  19. 19.
    Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000)MathSciNetGoogle Scholar
  20. 20.
    Russell, R.: Index. US Patent 1,261,167 (1918). https://www.google.com/patents/US1261167
  21. 21.
    Saıs, F., Pernelle, N., Rousset, M.C.: L2R: a logical method for reference reconciliation. In: Proceedings of the AAAI, pp. 329–334 (2007)Google Scholar
  22. 22.
    Singla, P., Domingos, P.: Entity resolution with markov logic. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 572–582. IEEE (2006)Google Scholar
  23. 23.
    Winkler, W.E.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau. Citeseer (1999)Google Scholar
  24. 24.
    Winkler, W.E.: Overview of record linkage and current research directions. In: Bureau of the Census. Citeseer (2006)Google Scholar
  25. 25.
    Yermolovich, D.: Imena sobstvennyye na styke yazykov i kultur [proper names across languages and cultures]. R. Valent, Moscow (2001)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Ilgiz Mustafin
    • 1
    Email author
  • Marius-Cristian Frunza
    • 2
    • 3
  • JooYoung Lee
    • 1
  1. 1.Innopolis UniversityInnopolisRussia
  2. 2.Schwarzthal KapitalNeuilly sur SeineFrance
  3. 3.LABEX ReFiParisFrance

Personalised recommendations