Skip to main content

A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

Abstract

This paper presents a method for Entity Disambiguation in Information Extraction from different sources in the web. Once entities and relations between them are extracted, it is needed to determine which ones are referring to the same real-world entity. We model the problem as a graph partitioning problem in order to combine the available information more accurately than a pairwise classifier. Moreover, our method handle uncertain information which turns out to be quite helpful. Two algorithms are trained and compared, one probabilistic and the other deterministic. Both are tuned using genetic algorithms to find the best weights for the set of constraints. Experiments show that graph-based modeling yields better results using uncertain information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 127–138. ACM Press, New York (1995)

    Chapter  Google Scholar 

  2. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI (2003)

    Google Scholar 

  3. Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 39–48. ACM Press, New York (2003)

    Google Scholar 

  4. Han, H., Giles, L., Li, H.Z.C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004, pp. 296–305 (2004)

    Google Scholar 

  5. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 169–178. ACM Press, New York (2000)

    Chapter  Google Scholar 

  6. Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD 2004: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, Paris, France, pp. 11–18. ACM Press, New York (2004)

    Chapter  Google Scholar 

  7. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Processing (NIPS) (2002)

    Google Scholar 

  8. Doan, A., Lu, Y., Lee, Y., Han, J.: Profile-based object matching for information integration. IEEE Intelligent Systems 18(5), 54–59 (2003)

    Article  Google Scholar 

  9. Shen, W., Li, X., Doan, A.: Constraint-based entity matching. In: Proceedings of AAAI (2005)

    Google Scholar 

  10. Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM 2006, pp. 572–582. IEEE Computer Society, Washington (2006)

    Google Scholar 

  11. Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 257–258. ACM, New York (2005)

    Google Scholar 

  12. Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL 2005: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pp. 334–343. ACM, New York (2005)

    Google Scholar 

  13. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1(1), 5 (2007)

    Article  Google Scholar 

  14. Wang, C., Lu, J., Zhang, G.: A constrained clustering approach to duplicate detection among relational data. In: Advances in Knowledge Discovery and Data Mining, pp. 308–319 (2007)

    Google Scholar 

  15. Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL 2007: Proceedings of the 7th ACM/IEEE joint conference on Digital libraries, pp. 204–213. ACM, New York (2007)

    Chapter  Google Scholar 

  16. Sapena, E., Padró, L., Turmo, J.: Alias assigment in information extraction. In: Proceedings of SEPLN-2007, Sevilla, Spain (2007)

    Google Scholar 

  17. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)

    MATH  Google Scholar 

  18. Pelillo, M., Abbattista, F., Maffione, A.: An evolutionary approach to training relaxation labeling processes. Pattern Recogn. Lett. 16(10), 1069–1078 (1995)

    Article  Google Scholar 

  19. Rosenfeld, R., Hummel, R.A., Zucker, S.W.: Scene labelling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6(6), 420–433 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  20. Màrquez, L., Padró, L., Rodríguez, H.: A machine learning approach for pos tagging. Machine Learning Journal 39(1), 59–91 (2000)

    Article  MATH  Google Scholar 

  21. Atserias, J.: Towards Robustness in Natural Language Understanding. Ph.D. Thesis, Dept. Lenguajes y Sistemas Informáticos. Euskal Herriko Unibertsitatea. Donosti. Spain (2006)

    Google Scholar 

  22. Comellas, F., Ozon, J.: An ant algorithm for the graph colouring problem. In: ANTS 1998 - From Ant Colonies to Artificial Ants: First international workshop on ant colony optimization, Brussels (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sapena, E., Padró, L., Turmo, J. (2008). A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics