Abstract
Some of the attributes of a database relation may evolve over time i.e., they change their values at different instants of time. For example, affiliation attribute of an author relation in a bibliographic database which maintains publication details of various authors, may change its value. When a database contains records of this nature and number of records grows to a large extent then it becomes really very challenging to identify which records belong to which entity due to lack of a proper key. In such a situation, the other attributes of the records and the timed information associated with the records may be useful in identifying whether the records belong to the same entity or different. In the proposed work, the records are initially clustered based on email-id attribute and the clusters are further refined based on other temporal and non-temporal attributes. The refinement process involves similarity check with other records and clusters. A comparative analysis with two existing systems DBLP and ArnetMiner shows that the proposed technique can able to produce better results in many cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD, pp. 39–48 (August 2003)
Chaudhuri, S., Chen, B.-C., Ganti, V., Kaushik, R.: Example-driven design of efficient record matching queries. In: VLDB, pp. 327–338 (September 2007)
Fan, W., Jia, X., Li, J., Ma, S.: Reasoning about record matching rules. VLDB 2(1), 407–418 (2009)
Gal, A., Atluri, V.: An authorization model for temporal data. In: CCS, pp. 144–153 (November 2000)
Hernández, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD, pp. 127–138 (May 1995)
Li, P., Dong, X.L., Maurino, A., Srivastava, D.: Linking temporal records. VLDB 4(11), 956–967 (2011)
Li, P., Wang, H., Tziviskou, C., Dong, X.L., Liu, X., Muarino, A., Srivastava, D.: CHRONOS: facilitating history discovery by linking temporal records. VLDB 5(12), 2006–2009 (2012)
Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 569–584. Springer, Heidelberg (2012)
Wang, J., Li, G., Yu, J.X., Feng, J.: Entity Matching: how similar is similar. VLDB 4(10), 622–633 (2011)
Yin, X., Han, J., Yu, P.S.: Object Distinction: distinguishing objects with identical names. In: ICDE, pp. 1242–1246 (April 2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mishra, S., Mondal, S., Saha, S. (2013). Entity Matching Technique for Bibliographic Database. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40173-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40173-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40172-5
Online ISBN: 978-3-642-40173-2
eBook Packages: Computer ScienceComputer Science (R0)