Advertisement

Restoring: A Greedy Heuristic Approach Based on Neighborhood for Correlation Clustering

  • Ning Wang
  • Jie Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)

Abstract

Correlation Clustering has received considerable attention in machine learning literature due to its not requiring specifying the number of clusters in advance. Many approximation algorithms for Correlation Clustering have been proposed with worst-case theoretical guarantees, but with less experimental evaluations. These methods simply consider the direct associations between vertices and achieve poor performance in real datasets. In this paper, we propose a neighborhood-based method called Restoring, in which we argue that the neighborhood around two connected vertices is important and two vertices belonging to the same cluster should have the same neighborhood. Our algorithm iteratively chooses two connected vertices and restores their neighborhood. We also define the cost of keeping or removing one non-common neighbor and identify a restoring order based on the neighborhood similarity. Experiments conducted on five sub datasets of Cora show that our method performs better than existing well-known methods both in results quality and objective value.

Keywords

Correlation Clustering Neighborhood Similarity Entity Resolution 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1-3), 89–113 (2004)Google Scholar
  2. 2.
    Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 104–111. Association for Computational Linguistics (2002)Google Scholar
  3. 3.
    Cohen, W.W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In. In: Proceedings of the Eighth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–480. ACM (2002)Google Scholar
  4. 4.
  5. 5.
    Malioutov, I., Barzilay, R.: Minimum cut model for spoken lecture segmentation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 25–32. Association for Computational Linguistics (2006)Google Scholar
  6. 6.
    Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. Journal of Computer and System Sciences 71(3), 360–383 (2005)Google Scholar
  7. 7.
    Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. Theoret. Comput. Science 361(2–3), 172–187 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Swamy, C.: Correlation clustering: maximizing agreements via semi definite programming. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 526–527. Society for Industrial and Applied Mathematics (2004)Google Scholar
  9. 9.
    Ailon, N., Charikar, M., Newman, A.: Aggregating in consistent information: rank in gand clustering. Journal of the ACM (JACM) 55(5), 23 (2008)Google Scholar
  10. 10.
    VanZuylen, A., Williamson, D.P.: Deterministicpi voting algorithms for constraine dranking and clustering problems. Mathematics of Operations Research 34(3), 594–620 (2009)Google Scholar
  11. 11.
    Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1167–1176. ACM (2006)Google Scholar
  12. 12.
    Bonchi, F., Gionis, A., Ukkonen, A.: Overlapping correlation clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 51–60 (2011)Google Scholar
  13. 13.
    Bonchi, F., Gionis, A., Gullo, F., Ukkonen, A.: Chromatic correlation clustering. In: Proceedings of the 18th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1321–1329. ACM (2012)Google Scholar
  14. 14.
    Bertolacci, M., Wirth, A.: Are approximation algorithms for consensus clustering worth while? In: SDM (2007)Google Scholar
  15. 15.
    Goder, A., Filkov, V.: Consensus Clustering Algorithms: Comparison and Refinement. In: ALENEX, vol. 8, pp. 109–117 (2008)Google Scholar
  16. 16.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 4 (2007)Google Scholar
  17. 17.
    Elsner, M., Schudy, W.: Bounding and comparing methods for correlation clustering beyond ILP. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 19–27. Association for Computational Linguistics (2009)Google Scholar
  18. 18.
    Elsner, M., Charniak, E.: You Talking to Me?A Corpus and Algorithm for Conversation Disentanglement. In: ACL, pp. 834–842 (2008)Google Scholar
  19. 19.
    Meilă, M.: Comparing clusterings—an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)Google Scholar
  20. 20.
    Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ning Wang
    • 1
  • Jie Li
    • 1
  1. 1.School of Computer and Information TechnologyBeijing Jiaotong UniversityBeijingChina

Personalised recommendations