Approximate Correlation Clustering Using Same-Cluster Queries

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10807)

Abstract

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make same-cluster queries. More specifically, in their model, there is a query oracle that answers queries of the form “given any two vertices, do they belong to the same optimal cluster?”. In many clustering contexts, this kind of oracle queries are feasible. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time algorithm for the k-means clustering problem where the input dataset satisfies some separation condition. Ailon et al. extended the above work to the approximation setting by giving an efficient \((1+\varepsilon )\)-approximation algorithm for k-means for any small \(\varepsilon > 0\) and any dataset within the SSAC framework. In this work, we extend this line of study to the correlation clustering problem. Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices into clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. These problems are popularly known as \(\mathsf {MinDisAgree}\) and \(\mathsf {MaxAgree}\) problems, and \(\mathsf {MinDisAgree}[k]\) and \(\mathsf {MaxAgree}[k]\) are versions of these problems where the number of optimal clusters is at most k. There exist Polynomial Time Approximation Schemes (PTAS) for \(\mathsf {MinDisAgree}[k]\) and \(\mathsf {MaxAgree}[k]\) where the approximation guarantee is \((1+\varepsilon )\) for any small \(\varepsilon \) and the running time is polynomial in the input parameters but exponential in k and \(1/\varepsilon \). We get a significant running time improvement within the SSAC framework at the cost of making a small number of same-cluster queries. We obtain an \((1+ \varepsilon )\)-approximation algorithm for any small \(\varepsilon \) with running time that is polynomial in the input parameters and also in k and \(1/\varepsilon \). We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH). Note that the existence of an efficient algorithm for \(\mathsf {MinDisAgree}[k]\) in the SSAC setting exhibits the power of same-cluster queries since such polynomial time algorithm (polynomial even in k and \(1/\varepsilon \)) is not possible in the classical (non-query) setting due to our conditional lower bounds. Our conditional lower bound is particularly interesting as it not only establishes a lower bound on the number of same cluster queries in the SSAC framework but also establishes a conditional lower bound on the running time of any \((1+\varepsilon )\)-approximation algorithm for \(\mathsf {MinDisAgree}[k]\).

References

  1. 1.
    Ailon, N., Bhattacharya, A., Jaiswal, R., Kumar, A.: Approximate clustering with same-cluster queries (2017). CoRR, abs/1704.01862. To Appear in ITCS 2018Google Scholar
  2. 2.
    Angelidakis, H., Makarychev, K., Makarychev, Y.: Algorithms for stable and perturbation-resilient problems. In: STOC, pp. 438–451 (2017)Google Scholar
  3. 3.
    Ashtiani, H., Kushagra, S., Ben-David, S.: Clustering with same-cluster queries. In: NIPS, pp. 3216–3224 (2016)Google Scholar
  4. 4.
    Awasthi, P., Balcan, M.-F, Voevodski, K.: Local algorithms for interactive clustering. In: ICML, pp. 550–558 (2014)Google Scholar
  5. 5.
    Balcan, M.-F., Blum, A.: Clustering with interactive feedback. In: Freund, Y., Györfi, L., Turán, G., Zeugmann, T. (eds.) ALT 2008. LNCS (LNAI), vol. 5254, pp. 316–328. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-87987-9_27 CrossRefGoogle Scholar
  6. 6.
    Balcan, M.-F., Blum, A., Gupta, A.: Clustering under approximation stability. J. ACM (JACM) 60(2), 8 (2013)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Balcan, M.F., Liang, Y.: Clustering under perturbation resilience. SIAM J. Comput. 45(1), 102–155 (2016)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Dinur, I.: The PCP theorem by gap amplification. J. ACM 54(3), 12 (2007)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Fomin, F.V., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Villanger, Y.: Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J. Comput. Syst. Sci. 80(7), 1430–1447 (2014)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1167–1176 (2006)Google Scholar
  13. 13.
    Impagliazzo, R., Paturi, R.: On the complexity of k-SAT. J. Comput. Syst. Sci. 62(2), 367–375 (2001)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Makarychev, K., Makarychev, Y., Vijayaraghavan, A.: Correlation clustering with noisy partial information. In: COLT, pp. 1321–1342 (2015)Google Scholar
  16. 16.
    Manurangsi, P.: Almost-polynomial ratio ETH-hardness of approximating densest \(k\)-subgraph. CoRR, abs/1611.05991 (2016)Google Scholar
  17. 17.
    Mathieu, C., Schudy, W.: Correlation clustering with noisy input. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 712–728 (2010)Google Scholar
  18. 18.
    Mazumdar, A., Saha, B.: Query complexity of clustering with side information. arXiv preprint arXiv:1706.07719 (2017)
  19. 19.
    Voevodski, K., Balcan, M.-F., Röglin, H., Teng, S.-H., Xia, Y.: Efficient clustering with limited distance information. In: Conference on Uncertainty in Artificial Intelligence, pp. 632–640 (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.TechnionHaifaIsrael
  2. 2.Department of Computer Science and EngineeringIndian Institute of Technology DelhiNew DelhiIndia

Personalised recommendations