Advertisement

Entity Matching Technique for Bibliographic Database

  • Sumit Mishra
  • Samrat Mondal
  • Sriparna Saha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8056)

Abstract

Some of the attributes of a database relation may evolve over time i.e., they change their values at different instants of time. For example, affiliation attribute of an author relation in a bibliographic database which maintains publication details of various authors, may change its value. When a database contains records of this nature and number of records grows to a large extent then it becomes really very challenging to identify which records belong to which entity due to lack of a proper key. In such a situation, the other attributes of the records and the timed information associated with the records may be useful in identifying whether the records belong to the same entity or different. In the proposed work, the records are initially clustered based on email-id attribute and the clusters are further refined based on other temporal and non-temporal attributes. The refinement process involves similarity check with other records and clusters. A comparative analysis with two existing systems DBLP and ArnetMiner shows that the proposed technique can able to produce better results in many cases.

Keywords

Bibliographic Database Entity Matching Temporal Data Similarity Check 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD, pp. 39–48 (August 2003)Google Scholar
  2. 2.
    Chaudhuri, S., Chen, B.-C., Ganti, V., Kaushik, R.: Example-driven design of efficient record matching queries. In: VLDB, pp. 327–338 (September 2007)Google Scholar
  3. 3.
    Fan, W., Jia, X., Li, J., Ma, S.: Reasoning about record matching rules. VLDB 2(1), 407–418 (2009)Google Scholar
  4. 4.
    Gal, A., Atluri, V.: An authorization model for temporal data. In: CCS, pp. 144–153 (November 2000)Google Scholar
  5. 5.
    Hernández, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD, pp. 127–138 (May 1995)Google Scholar
  6. 6.
    Li, P., Dong, X.L., Maurino, A., Srivastava, D.: Linking temporal records. VLDB 4(11), 956–967 (2011)Google Scholar
  7. 7.
    Li, P., Wang, H., Tziviskou, C., Dong, X.L., Liu, X., Muarino, A., Srivastava, D.: CHRONOS: facilitating history discovery by linking temporal records. VLDB 5(12), 2006–2009 (2012)Google Scholar
  8. 8.
    Li, S., Cong, G., Miao, C.: Author name disambiguation using a new categorical distribution similarity. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 569–584. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Wang, J., Li, G., Yu, J.X., Feng, J.: Entity Matching: how similar is similar. VLDB 4(10), 622–633 (2011)Google Scholar
  10. 10.
    Yin, X., Han, J., Yu, P.S.: Object Distinction: distinguishing objects with identical names. In: ICDE, pp. 1242–1246 (April 2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sumit Mishra
    • 1
  • Samrat Mondal
    • 1
  • Sriparna Saha
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology PatnaIndia

Personalised recommendations