Abstract
The aim of this paper is to explore methods of multilingual entity matching. Name matching is currently the main technique used for entity resolution. When dealing with entities having features recorded in different languages and with different alphabets the basic approaches have serious limitation. The basic name matching approach using string comparison metrics is enriched with phonetic rules and with relational information. The results show that the approach using transliteration enhanced by phonetic matching provides with the best performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Adafre, S.F., de Rijke, M.: Discovering missing links in Wikipedia. In: Proceedings of the 3rd international workshop on Link discovery, pp. 90–97. ACM (2005)
Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: Hinge-loss markov random fields and probabilistic soft logic. arXiv preprint arXiv:1505.04406 (2015)
Beider, A.: Beider-morse phonetic matching: an alternative to Soundex with fewer false hits. Avotaynu: Int. Rev. Jewish Geneal. (2008)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 5 (2007)
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)
Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012)
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 85–96. ACM (2005)
Goergen, A., Ashida, S., Skapinsky, K., De Heer, H., Wilkinson, A., Koehly, L.: Knowledge is power: improving family health history knowledge of diabetes and heart disease among multigenerational mexican origin families. Public Health Genomics 19(2), 93–101 (2016)
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774. ACM (2011)
Kimmig, A., Bach, S., Broecheler, M., Huang, B., Getoor, L.: A short introduction to probabilistic soft logic. In: Proceedings of the NIPS Workshop on Probabilistic Programming: Foundations and Applications, pp. 1–4 (2012)
Kouki, P., Pujara, J., Marcum, C., Koehly, L., Getoor, L.: Collective entity resolution in familial networks. In: 2017 IEEE International Conference on Data Mining (ICDM), pp. 227–236. IEEE (2017)
Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletion of ones. Probl. Inf. Transm. 1(1), 8–17 (1965)
Li, C.W.C.: Foreign names into native tongues: how to transfer sound between languages-transliteration, phonological translation, nativization, and implications for translation theory. Target. Int. J. Transl. Stud. 19(1), 45–68 (2007)
Mokotoff, G.: Soundexing and genealogy (2007). http://www.avotaynu.com/soundex.html
Moore, G.B.: Accessing Individual Records from Personal Data Files Using Non-unique Identifiers, vol. 13. US Department of Commerce, National Bureau of Standards (1977)
Patman, F., Thompson, P.: Names: a new frontier in text mining. In: International Conference on Intelligence and Security Informatics, pp. 27–38. Springer (2003)
Peng, T., Li, L., Kennedy, J.: A comparison of techniques for name matching. GSTF J. Comput. (JoC) 2(1), 55–61 (2012)
Philips, L.: The double metaphone search algorithm. C/C++ Users J. 18(6), 38–43 (2000)
Russell, R.: Index. US Patent 1,261,167 (1918). https://www.google.com/patents/US1261167
Saıs, F., Pernelle, N., Rousset, M.C.: L2R: a logical method for reference reconciliation. In: Proceedings of the AAAI, pp. 329–334 (2007)
Singla, P., Domingos, P.: Entity resolution with markov logic. In: Sixth International Conference on Data Mining, ICDM 2006, pp. 572–582. IEEE (2006)
Winkler, W.E.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau. Citeseer (1999)
Winkler, W.E.: Overview of record linkage and current research directions. In: Bureau of the Census. Citeseer (2006)
Yermolovich, D.: Imena sobstvennyye na styke yazykov i kultur [proper names across languages and cultures]. R. Valent, Moscow (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mustafin, I., Frunza, MC., Lee, J. (2020). Multilingual Entity Matching. In: Barolli, L., Takizawa, M., Xhafa, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2019. Advances in Intelligent Systems and Computing, vol 926. Springer, Cham. https://doi.org/10.1007/978-3-030-15032-7_68
Download citation
DOI: https://doi.org/10.1007/978-3-030-15032-7_68
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15031-0
Online ISBN: 978-3-030-15032-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)