Skip to main content

Consolidation of References to Persons in Bibliographic Databases

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5362))

Abstract

Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in bibliographic data, specifically in the case of consolidating multiple datasets. Or solution follows the extraction, transformation and loading (ETL) process, typical in data warehouses. It computes the similarities of the attribute values for the references, and employs a decision tree to decide when the references match. We describe the characteristics of these references within bibliographic datasets, and how we explored those characteristics by developing new similarity metrics to improve the quality of the consolidation process. We evaluated our work by designing an experiment with data from four national libraries. The results show that the proposed similarity metrics contribute significantly to the consolidation process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Transactions on knowledge and data engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  2. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: Proceedings of the 2005 ACM SIGMOD international Conference on Management of Data. SIGMOD 2005, pp. 85–96. ACM, New York (2005)

    Google Scholar 

  3. Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Exploiting relationships for object consolidation. In: IQIS 2005, pp. 47–58. ACM, New York (2005)

    Google Scholar 

  4. ALA, CLA, CILIP. Anglo-American Cataloguing Rules: 2002 Revision (2002)

    Google Scholar 

  5. Kaiser, M., Lieder, H.J., Majcen, K., Vallant, H.: New Ways of Sharing and Using Authority Information. D-Lib Magazine 9(11) (2003), http://www.dlib.org/dlib/november03/lieder/11lieder.html

  6. Lawrence, S., Giles, C.L., Bollacker, K.D.: Autonomous Citation Matching. In: Proceedings of the Third International Conference on Autonomous Agents. ACM, New York (1999)

    Google Scholar 

  7. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity Uncertainty and Citation Matching. In: Advances in Neural Information Processing (2002)

    Google Scholar 

  8. Martins, B., Manguinhas, H., Borbinha, J.: Extracting and Exploring Semantic Geographical Information from Textual Resources. In: Proceedings of the Second IEEE International Conference on Semantic Computing (ICSC) (2008)

    Google Scholar 

  9. Manguinhas, H., Martins, B., Borbinha, J., Siabato, W.: The DIGMAP Geo-Temporal Web Gazetteer Service. In: Third ICA Workshop on Digital Approaches to Cartographic Heritage (2008)

    Google Scholar 

  10. Jaro, M.A.: Advances in record linking methodology as applied to the 1985 census of Tampa Florida. Journal of the American Statistical Society 64, 1183–1210 (1989)

    Google Scholar 

  11. Freund, Y., Mason, L.: The Alternating Decision Tree Algorithm. In: Proceedings of the 16th International Conference on Machine Learning, pp. 124–133 (1999)

    Google Scholar 

  12. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  13. Martins, B., Freire, N., Borbinha, J.: Using XML Technologies for Complex Data Transformations in Geo-referenced Digital Libraries. In: International Conference on Asia-Pacific Digital Libraries 2008 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Freire, N., Borbinha, J., Martins, B. (2008). Consolidation of References to Persons in Bibliographic Databases. In: Buchanan, G., Masoodian, M., Cunningham, S.J. (eds) Digital Libraries: Universal and Ubiquitous Access to Information. ICADL 2008. Lecture Notes in Computer Science, vol 5362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89533-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89533-6_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89532-9

  • Online ISBN: 978-3-540-89533-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics