Abstract
Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a, possibly inconsistent, binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the cost of a solution is computed against the input similarity function. This problem has been studied in theory and in practice and has been subsequently proven to be APX-Hard.
In this work we assume that there does exist an unknown correct clustering of the data. In this setting, we argue that it is more reasonable to measure the output clustering’s accuracy against the unknown underlying true clustering. We present two main results. The first is a novel method for continuously morphing a general (non-metric) function into a pseudometric. This technique may be useful for other metric embedding and clustering problems. The second is a simple algorithm for randomly rounding a pseudometric into a clustering. Combining the two, we obtain a certificate for the possibility of getting a solution of factor strictly less than 2 for our problem. This approximation coefficient could not have been achieved by considering the agnostic version of the problem unless P = NP.
Work done while second author was visiting Google Research as a summer intern.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning Journal (Special Issue on Theoretical Advances in Data Clustering) 56(1–3), 89–113 (2004); Extended abstract appeared in FOCS 2002, pp. 238–247
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pp. 684–693 (2005)
Bonizzoni, P., Della Vedova, G., Dondi, R., Jiang, T.: On the approximation of correlation clustering and consensus clustering. Journal of Computer and System Sciences 74(5), 671–696 (2008)
Emanuel, D., Fiat, A.: Correlation clustering – minimizing disagreements on arbitrary weighted graphs. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 208–220. Springer, Heidelberg (2003)
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Boston, pp. 524–533 (2003)
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA 2006: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp. 1167–1176. ACM Press, New York (2006)
Ailon, N., Charikar, M.: Fitting tree metrics: Hierarchical clustering and phylogeny. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS (2005)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), Tokyo (to appear, 2005)
Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. In: Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI), Sacramento, pp. 418–425 (2003)
Strehl, A.: Relationship-based clustering and cluster ensembles for high-dimensional data mining. PhD Dissertation, University of Texas at Austin (May 2002)
McSherry, F.: Spectral partitioning of random graphs. In: FOCS 2001: Proceedings of the 42nd IEEE symposium on Foundations of Computer Science, Washington, p. 529 (2001)
Aslam, J., Leblanc, A., Stein, C.: A new approach to clustering. In: 4th International Workshop on Algorithm Engineering (2000)
Ailon, N., Mohri, M.: Efficient reduction of ranking to classification. In: The 21st Annual Conference on Learning Theory (COLT), Helsinki, Finland (to appear, 2008)
Balcan, M.-F., Blum, A., Gupta, A.: Approximate clustering without the approximation. In: SODA 2009, New York (2009)
Ailon, N., Liberty, E.: Correlation clustering revisited: The “true” cost of error minimization problems. Yale University Tecnical Report 1214 (2008)
Ailon, N.: Aggregation of partial rankings, p-ratings and top-m lists. In: SODA (2007)
Ailon, N., Liberty, E.: Mathematica program (2008), http://www.cs.yale.edu/homes/el327/public/prove32/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ailon, N., Liberty, E. (2009). Correlation Clustering Revisited: The “True” Cost of Error Minimization Problems. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds) Automata, Languages and Programming. ICALP 2009. Lecture Notes in Computer Science, vol 5555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02927-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-02927-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02926-4
Online ISBN: 978-3-642-02927-1
eBook Packages: Computer ScienceComputer Science (R0)