Abstract
Entity matching is a problem that concerns many data management processes. If we consider matching between entities represented by RDF individuals we might find attributes values lists with variable-length for some properties, which will lead us to the problem of comparing multi-valued attributes, e.g. comparing author names lists for determining publication matching. This matching technique would be more complex than comparing fixed-length records, but less complex than comparing XML documents. Instead of comparing a single string, representing the concatenation of these values, each value of one vector should be compared against all values of the other vector. We propose a set of heuristics to address the alignment and comparison process of multi-valued attributes and evaluate them in the context of bibliographic databases. Our first results show that it is possible to reduce the comparisons amount and provide an aggregated similarity metric that outperforms the average similarity of cross product comparisons.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16–23 (2003)
Burkard, R., Dell’Amico, M., Martello, S.: Assignment Problems. Siam, Philadelphia (2009)
Cohen, W.W., Fienberg, S.E.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of the ACM Workshop on Data Cleaning, Record Linkage and Object Identification (2003)
DCMI: Dublin Core Ontology (2012), http://dublincore.org/documents/dces/
Dorneles, C.F., Gonçalves, R., Santos Mello, R.: Approximate data instance matching: a survey. Knowledge and Information Systems 27(1), 1–21 (2010)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Google: Google Refine Project (2012), http://code.google.com/p/google-refine/
Grannis, S.J., Overhage, J.M., McDonald, C.: Real world performance of approximate string comparators for use in patient matching. Studies in Health Technology and Informatics 107(pt.1), 43–47 (2004)
Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the Results of Approximate Match Operations. In: Proceedings of The Thirtieth International Conference on Very Large Data Bases, pp. 636–647 (2004)
Köpcke, H., Rahm, E.: Frameworks for entity matching: A comparison. Data & Knowledge Engineering 69(2), 197–210 (2010)
Köpcke, H., Thor, A., Rahm, E.: Comparative evaluation of entity resolution approaches with FEVER. In: Proceedings of 35th Intl. Conference on Very Large Databases (VLDB) (2009)
Morris, T., Huynh, D.: FingerPrint Method (2010), https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth
Porter, E.H., Winkler, W.E.: Approximate String Comparison and its Effect on an Advanced Record Linkage System. Tech. rep (1997)
Ravikumar, P., Cohen, W.W., Fienberg, S.E.: A secure protocol for computing string distance metrics. In: Proceedings of the Workshop on Privacy and Security Aspects of Data Mining at the Int. Conf. on Data Mining, pp. 40–46 (2004)
Sure, Y., Bloehdorn, S., Haase, P., Hartmann, J., Oberle, D.: The swrc ontology - semantic web for research communities. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS (LNAI), vol. 3808, pp. 218–231. Springer, Heidelberg (2005)
Talburt, J.R.: Entity resolution and information quality. Elsevier (2011)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk–a link discovery framework for the web of data. In: Proceedings of the 2nd Workshop on Linked Data on the Web (2009)
Winkler, W.E.: Advanced Methods For Record Linkage. Section on Survey Research Methods (American Statistical Association) (1994)
Winkler, W.E.: Overview of record linkage and current research directions. In: Proceedings of Bureau of the Census. Citeseer (2006)
Yancey, W.E.: Evaluating string comparator performance for record linkage. Statistical Research Division Research Report (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mazzucchi-Augel, P.N., Ceballos, H.G. (2014). An Alignment Comparator for Entity Resolution with Multi-valued Attributes. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Nature-Inspired Computation and Machine Learning. MICAI 2014. Lecture Notes in Computer Science(), vol 8857. Springer, Cham. https://doi.org/10.1007/978-3-319-13650-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-13650-9_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13649-3
Online ISBN: 978-3-319-13650-9
eBook Packages: Computer ScienceComputer Science (R0)