Correlation Clustering Revisited: The “True” Cost of Error Minimization Problems

Ailon, Nir; Liberty, Edo

doi:10.1007/978-3-642-02927-1_4

Nir Ailon²¹ &
Edo Liberty²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5555))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

1865 Accesses
12 Citations

Abstract

Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a, possibly inconsistent, binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the cost of a solution is computed against the input similarity function. This problem has been studied in theory and in practice and has been subsequently proven to be APX-Hard.

In this work we assume that there does exist an unknown correct clustering of the data. In this setting, we argue that it is more reasonable to measure the output clustering’s accuracy against the unknown underlying true clustering. We present two main results. The first is a novel method for continuously morphing a general (non-metric) function into a pseudometric. This technique may be useful for other metric embedding and clustering problems. The second is a simple algorithm for randomly rounding a pseudometric into a clustering. Combining the two, we obtain a certificate for the possibility of getting a solution of factor strictly less than 2 for our problem. This approximation coefficient could not have been achieved by considering the agnostic version of the problem unless P = NP.

Work done while second author was visiting Google Research as a summer intern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning Journal (Special Issue on Theoretical Advances in Data Clustering) 56(1–3), 89–113 (2004); Extended abstract appeared in FOCS 2002, pp. 238–247
Google Scholar
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 684–693 (2005)
Google Scholar
Bonizzoni, P., Della Vedova, G., Dondi, R., Jiang, T.: On the approximation of correlation clustering and consensus clustering. Journal of Computer and System Sciences 74(5), 671–696 (2008)
Article MathSciNet MATH Google Scholar
Emanuel, D., Fiat, A.: Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 208–220. Springer, Heidelberg (2003)
Chapter Google Scholar
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Boston, pp. 524–533 (2003)
Google Scholar
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA 2006: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp. 1167–1176. ACM Press, New York (2006)
Chapter Google Scholar
Ailon, N., Charikar, M.: Fitting tree metrics: Hierarchical clustering and phylogeny. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS (2005)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo (to appear, 2005)
Google Scholar
Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI), Sacramento, pp. 418–425 (2003)
Google Scholar
Strehl, A.: Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD Dissertation, University of Texas at Austin (May 2002)
Google Scholar
McSherry, F.: Spectral partitioning of random graphs. In: FOCS 2001: Proceedings of the 42nd IEEE symposium on Foundations of Computer Science, Washington, p. 529 (2001)
Google Scholar
Aslam, J., Leblanc, A., Stein, C.: A new approach to clustering. In: 4th International Workshop on Algorithm Engineering (2000)
Google Scholar
Ailon, N., Mohri, M.: Efficient reduction of ranking to classification. In: The 21st Annual Conference on Learning Theory (COLT), Helsinki, Finland (to appear, 2008)
Google Scholar
Balcan, M.-F., Blum, A., Gupta, A.: Approximate clustering without the approximation. In: SODA 2009, New York (2009)
Google Scholar
Ailon, N., Liberty, E.: Correlation clustering revisited: The “true” cost of error minimization problems. Yale University Tecnical Report 1214 (2008)
Google Scholar
Ailon, N.: Aggregation of partial rankings, p-ratings and top-m lists. In: SODA (2007)
Google Scholar
Ailon, N., Liberty, E.: Mathematica program (2008), http://www.cs.yale.edu/homes/el327/public/prove32/

Download references

Author information

Authors and Affiliations

Google Research, New York, NY, USA
Nir Ailon
Yale University, New Haven, CT, USA
Edo Liberty

Authors

Nir Ailon
View author publications
You can also search for this author in PubMed Google Scholar
Edo Liberty
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Freiburg, Georges Köhler Allee 79, 79110, Freiburg, Germany
Susanne Albers
Department of Computer and Systems Sciences, Sapienza University of Rome, Via Ariosto 25, 00184, Roma, Italy
Alberto Marchetti-Spaccamela
Google R&D Center, School of Computer Science, Tel Aviv University, 69978, Tel Aviv, Israel
Yossi Matias
University of Patras and CTI, N. Kazantzaki Street 1, 26504 Rion, Patras, Greece
Sotiris Nikoletseas
RWTH Aachen, Lehrstuhl Informatik 7, Ahornstraße 55, 52056, Aachen, Germany
Wolfgang Thomas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ailon, N., Liberty, E. (2009). Correlation Clustering Revisited: The “True” Cost of Error Minimization Problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02927-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-02927-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02926-4
Online ISBN: 978-3-642-02927-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics