Skip to main content

Correlation Clustering Revisited: The “True” Cost of Error Minimization Problems

  • Conference paper
Automata, Languages and Programming (ICALP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5555))

Included in the following conference series:

Abstract

Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a, possibly inconsistent, binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the cost of a solution is computed against the input similarity function. This problem has been studied in theory and in practice and has been subsequently proven to be APX-Hard.

In this work we assume that there does exist an unknown correct clustering of the data. In this setting, we argue that it is more reasonable to measure the output clustering’s accuracy against the unknown underlying true clustering. We present two main results. The first is a novel method for continuously morphing a general (non-metric) function into a pseudometric. This technique may be useful for other metric embedding and clustering problems. The second is a simple algorithm for randomly rounding a pseudometric into a clustering. Combining the two, we obtain a certificate for the possibility of getting a solution of factor strictly less than 2 for our problem. This approximation coefficient could not have been achieved by considering the agnostic version of the problem unless P = NP.

Work done while second author was visiting Google Research as a summer intern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning Journal (Special Issue on Theoretical Advances in Data Clustering) 56(1–3), 89–113 (2004); Extended abstract appeared in FOCS 2002, pp. 238–247

    Google Scholar 

  2. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 684–693 (2005)

    Google Scholar 

  3. Bonizzoni, P., Della Vedova, G., Dondi, R., Jiang, T.: On the approximation of correlation clustering and consensus clustering. Journal of Computer and System Sciences 74(5), 671–696 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Emanuel, D., Fiat, A.: Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 208–220. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Boston, pp. 524–533 (2003)

    Google Scholar 

  6. Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA 2006: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp. 1167–1176. ACM Press, New York (2006)

    Chapter  Google Scholar 

  7. Ailon, N., Charikar, M.: Fitting tree metrics: Hierarchical clustering and phylogeny. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS (2005)

    Google Scholar 

  8. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo (to appear, 2005)

    Google Scholar 

  9. Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI), Sacramento, pp. 418–425 (2003)

    Google Scholar 

  10. Strehl, A.: Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD Dissertation, University of Texas at Austin (May 2002)

    Google Scholar 

  11. McSherry, F.: Spectral partitioning of random graphs. In: FOCS 2001: Proceedings of the 42nd IEEE symposium on Foundations of Computer Science, Washington, p. 529 (2001)

    Google Scholar 

  12. Aslam, J., Leblanc, A., Stein, C.: A new approach to clustering. In: 4th International Workshop on Algorithm Engineering (2000)

    Google Scholar 

  13. Ailon, N., Mohri, M.: Efficient reduction of ranking to classification. In: The 21st Annual Conference on Learning Theory (COLT), Helsinki, Finland (to appear, 2008)

    Google Scholar 

  14. Balcan, M.-F., Blum, A., Gupta, A.: Approximate clustering without the approximation. In: SODA 2009, New York (2009)

    Google Scholar 

  15. Ailon, N., Liberty, E.: Correlation clustering revisited: The “true” cost of error minimization problems. Yale University Tecnical Report 1214 (2008)

    Google Scholar 

  16. Ailon, N.: Aggregation of partial rankings, p-ratings and top-m lists. In: SODA (2007)

    Google Scholar 

  17. Ailon, N., Liberty, E.: Mathematica program (2008), http://www.cs.yale.edu/homes/el327/public/prove32/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ailon, N., Liberty, E. (2009). Correlation Clustering Revisited: The “True” Cost of Error Minimization Problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02927-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02927-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02926-4

  • Online ISBN: 978-3-642-02927-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics