A Quantum Annealing-Based Approach to Extreme Clustering

  • Tim JaschekEmail author
  • Marko Bucyk
  • Jaspreet S. Oberoi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1130)


Clustering, or grouping, dataset elements based on similarity can be used not only to classify a dataset into a few categories, but also to approximate it by a relatively large number of representative elements. In the latter scenario, referred to as extreme clustering, datasets are enormous and the number of representative clusters is large. We have devised a distributed method that can efficiently solve extreme clustering problems using quantum annealing. We prove that this method yields optimal clustering assignments under a separability assumption, and show that the generated clustering assignments are of comparable quality to those of assignments generated by common clustering algorithms, yet can be obtained a full order of magnitude faster.


Extreme clustering Distributed computing Quantum computing Maximum weighted independent set Unsupervised learning 



We thank Saeid Allahdadian, Nick Condé, Daniel Crawford, and Austin Wallace for their contributions to an earlier version of the algorithm. We thank Maliheh Aramon, Pooja Pandey, and Brad Woods for helpful discussions on optimization theory. The implementation of the QUBO preprocessing techniques was conducted jointly with Brad Woods and Nick Condé. Inderpreet Singh contributed to the figure on image quantization. Victoria Wong assisted with graphical editing of individual figures. Partial funding for this work was provided by the Mitacs Accelarate internship initiative.


  1. 1.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. SIAM (2007)Google Scholar
  2. 2.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: ACM Sigmod Record, vol. 25, pp. 103–114. ACM (1996)Google Scholar
  3. 3.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)Google Scholar
  4. 4.
    Nayak, R., Mills, R., De-Vries, C., Geva, S.: Clustering and labeling a web scale document collection using Wikipedia clusters. In: Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning, pp. 23–30. ACM (2014)Google Scholar
  5. 5.
    de Vries, C.M., de Vine, L., Geva, S., Nayak, R.: Parallel streaming signature EM-tree: a clustering algorithm for web scale applications. In: Proceedings of the 24th International Conference on World Wide Web, pp. 216–226. International World Wide Web Conferences Steering Committee (2015)Google Scholar
  6. 6.
    Wang, X.J., Zhang, L., Liu, C.: Duplicate discovery on 2 billion internet images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 429–436 (2013)Google Scholar
  7. 7.
    Liu, T., Rosenberg, C., Rowley, H.A.: Clustering billions of images with large scale nearest neighbor search. In: Proceedings of the 8th IEEE Workshop on Applications of Computer Vision, WACV 2007, p. 28. IEEE Computer Society, Washington (2007)Google Scholar
  8. 8.
    Woodley, A., Tang, L.X., Geva, S., Nayak, R., Chappell, T.: Parallel K-Tree: a multicore, multinode solution to extreme clustering. Future Gener. Comput. Syst. 99, 333–345 (2018)CrossRefGoogle Scholar
  9. 9.
    Kobren, A., Monath, N., Krishnamurthy, A., McCallum, A.: A hierarchical algorithm for extreme clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 255–264. ACM (2017)Google Scholar
  10. 10.
    Kumar, V., Bass, G., Tomlin, C., Dulny, J.: Quantum annealing for combinatorial clustering. Quantum Inf. Process. 17(2), 39 (2018)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Merendino, S., Celebi, M.E.: A simulated annealing clustering algorithm based on center perturbation using Gaussian mutation. In: The 26th International FLAIRS Conference (2013)Google Scholar
  12. 12.
    Kurihara, K., Tanaka, S., Miyashita, S.: Quantum annealing for clustering. arXiv:1408.2035 (2014)
  13. 13.
    Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the 26th Annual ACM Symposium on Theory of computing, pp. 291–300. ACM (2004)Google Scholar
  14. 14.
    Balcan, M.F., Ehrlich, S., Liang, Y.: Distributed \(k\)-means and \(k\)-median clustering on general topologies. In: Advances in Neural Information Processing Systems, pp. 1995–2003 (2013)Google Scholar
  15. 15.
    Lucas, A.: Ising formulations of many NP problems. Front. Phys. 2, 5 (2014)CrossRefGoogle Scholar
  16. 16.
    Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations, pp. 85–103. Springer (1972)Google Scholar
  17. 17.
    D-Wave Systems Inc.: The D-Wave 2000Q Quantum Computer: Technology Overview (2017). Accessed 13 Feb 2019
  18. 18.
    Fujitsu Ltd.: Digital Annealer Introduction: Fujitsu Quantum-inspired Computing Digital Annealer (2018). Accessed 13 Feb 2019
  19. 19.
    Malkomes, G., Kusner, M.J., Chen, W., Weinberger, K.Q., Moseley, B.: Fast distributed k-center clustering with outliers on massive data. In: Advances in Neural Information Processing Systems, pp. 1063–1071 (2015)Google Scholar
  20. 20.
    Balaji, S., Swaminathan, V., Kannan, K.: Approximating maximum weighted independent set using vertex support. Int. J. Comput. Math. Sci. 3(8), 406–411 (2009)Google Scholar
  21. 21.
    Hifi, M.: A genetic algorithm-based heuristic for solving the weighted maximum independent set and some equivalent problems. J. Oper. Res. Soc. 48(6), 612–622 (1997)CrossRefGoogle Scholar
  22. 22.
    Kako, A., Ono, T., Hirata, T., Halldórsson, M.: Approximation algorithms for the weighted independent set problem in sparse graphs. Discrete Appl. Math. 157(4), 617–626 (2009)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Abbott, A.A., Calude, C.S., Dinneen, M.J., Hua, R.: A hybrid quantum-classical paradigm to mitigate embedding costs in quantum annealing. arXiv:1803.04340 (2018)
  24. 24.
    Nolte, A., Schrader, R.: A note on the finite time behavior of simulated annealing. Math. Oper. Res. 25(3), 476–484 (2000)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Lü, Z., Glover, F., Hao, J.K.: A hybrid metaheuristic approach to solving the UBQP problem. Eur. J. Oper. Res. 207(3), 1254–1262 (2010)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Zhu, Z., Fang, C., Katzgraber, H.G.: borealis – a generalized global update algorithm for Boolean optimization problems. arXiv:1605.09399 (2016)
  27. 27.
    Glover, F., Lewis, M., Kochenberger, G.: Logical and inequality implications for reducing the size and difficulty of quadratic unconstrained binary optimization problems. Eur. J. Oper. Res. 265(3), 829–842 (2018)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Mandal, S., Pal, M.: Maximum weight independent set of circular-arc graph and its application. J. Appl. Math. Comput. 22(3), 161–174 (2006)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Köhler, E., Mouatadid, L.: A linear time algorithm to compute a maximum weighted independent set on cocomparability graphs. Inf. Process. Lett. 116(6), 391–395 (2016)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Hernandez, M., Zaribafiyan, A., Aramon, M., Naghibi, M.: A novel graph-based approach for determining molecular similarity. arXiv:1601.06693 (2016)
  31. 31.
    LeCun, Y., Cortes, C., Burges, C.J.: MNIST handwritten digit database. AT&T Labs (2010).
  32. 32.
    Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  33. 33.
    Blackard, J.A.: UCI Machine Learning Repository (2017). Accessed 13 Feb 2019
  34. 34.
    Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)CrossRefGoogle Scholar
  36. 36.
    Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining, pp. 911–916 (2010)Google Scholar
  37. 37.
    Jain, R., Koronios, A.: Innovation in the cluster validating techniques. Fuzzy Optim. Decis. Making 7(3), 233 (2008)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178. ACM (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Tim Jaschek
    • 1
    • 2
    Email author
  • Marko Bucyk
    • 1
  • Jaspreet S. Oberoi
    • 1
    • 3
  1. 1.1QB Information Technologies (1QBit)VancouverCanada
  2. 2.Department of MathematicsUniversity of British ColumbiaVancouverCanada
  3. 3.School of Engineering ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations