Abstract
In this paper, we describe a metasearch tool resulting from experiments in aggregating the results of different name matching algorithms on a knowledge-intensive multicultural name matching task. Three retrieval engines that match romanized names were tested on a noisy and predominantly Arabic dataset. One is based on a generic string matching algorithm; another is designed specifically for Arabic names; and the third makes use of culturally-specific matching strategies for multiple cultures. We show that even a relatively naïve method for aggregating results significantly increased effectiveness over each of the individual algorithms, resulting in nearly tripling the F-score of the worst-performing algorithm included in the aggregate, and in a 6-point improvement in F-score over the single best-performing algorithm included.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Voorhees, E.M., Harman, D.K.: Overview of the Eighth Text REtrieval Conference (TREC-8). In: Voorhees, E.M., Harman, D.K. (eds.) The Eighth Text REtrieval Conference (TREC-8). U.S. Government Printing Office, Washington (2000)
Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)
Jaro, M.A.: Advances in Record-linkage Methodology a Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 89, 414–420 (1989)
Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods, pp. 354–359. American Statistical Association (1990)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Aslam, J.A., Montague, M.: Models for Metasearch. In: Proceedings of the 24th Annual International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 276–284. ACM Press, New York (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miller, K.J., Arehart, M. (2009). Result Aggregation for Knowledge-Intensive Multicultural Name Matching. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)