# Kernel methods in Quantum Machine Learning

- 310 Downloads

## Abstract

Quantum Machine Learning has established itself as one of the most promising applications of quantum computers and Noisy Intermediate Scale Quantum (NISQ) devices. In this paper, we review the latest developments regarding the usage of quantum computing for a particular class of machine learning algorithms known as kernel methods.

## Keywords

Quantum Machine Learning Quantum computing Kernel methods## 1 Introduction

The Classical-Classical (CC) class refers to ordinary machine learning or to machine learning algorithms that are inspired by the formalism of quantum mechanics. Here the dataset represents some classical system and the algorithm can run on a classical computer (Dong et al. 2019; Canabarro et al. 2019; Amin et al. 2018; Crawford et al. 2016; Stoudenmire and Schwab 2016; Sergioli et al. 2018; Levine et al. 2018). In the Classical-Quantum (CQ) class, algorithms rely on the advantages of quantum computation in order to speed up classical ML methods. Data are assumed to be classical in this class as well (Aïmeur et al. 2013; Mikhail et al. 2016; Wiebe et al. 2015; Barry et al. 2014; Lu and Braunstein 2014; Heim et al. 2015; Bottarelli et al. 2018). Quantum-Classical (QC) refers to the use of classical ML methods to analyse quantum systems (Agresti et al. 2019; Huembeli et al. 2019; Gray et al. 2018; Benedetti et al. 2019; Di Pierro et al. 2018; O’Driscoll et al. 2019; Iten et al. 2018). Finally, in the Quantum-Quantum (QQ) class, both the learning algorithm and the system under study are fully quantum (Yu et al. 2019).

Some very promising results have been obtained relatively to each of the four frameworks. In this paper, we have chosen to focus on the CQ section with the aim to review the main approaches that use quantum mechanics in order to obtain a computational advantage for a specific class of ML techniques called *kernel methods*. Our main motivation is to set a clear background for those who want to start investigations or carry out research in this field. A systematization of the current research in Quantum Machine Learning should include similar work in the other three sectors too, which we plan to accomplish in the future.

In the next section we will introduce kernel methods, with a particular attention to the Support Vector Machine (SVM) supervised learning model. Then, we will discuss the two main approaches to quantizing these methods. We have divided this discussion in two sections. In Section 3 we have collected those approaches which are aimed at the formulation of a quantum algorithm that implements a quantum version of the classical SVM. The second type of approaches is discussed in Section 4 and is aimed at exploiting the power of quantum computing to deal specifically with classically intractable kernels.

## 2 Kernel methods and SVM

Kernel methods (Theodoridis 2008) are classification algorithms that use a kernel function *K* in order to map data points, living in the input space *V*, to a higher dimensional feature space \(V^{\prime } \), where separability between classes of data becomes clearer. Kernel methods avoid the explicit calculation of the point coordinates in the new space by means of so called *kernel trick*, which allows us to work in the feature space \(V^{\prime } \) simply computing the kernel of pairs of data points in the input space (Theodoridis 2008).

*V*to the enhanced feature space \(V^{\prime }\). Then a kernel \(K: V\times V\rightarrow \mathbb {R} \) is a function

*n*real numbers \( (c_{1},{\dots } ,c_{n}) \) the following relation must hold

*K*(

**x**

_{i},

**x**

_{j}) is computationally cheaper than computing coordinates for each new point

**ϕ**(

**x**), and, on the other hand, we are never required to explicitly compute

**ϕ**(

*x*

_{i}) at any stage of the algorithm. The existence of a concrete mapping \( \mathbf {\phi }: V\rightarrow V^{\prime } \) is guaranteed by the

*Mercer theorem*(Mercer et al. 1909; Mohri et al. 2012), provided that the kernel function

*K*(

**x**

_{i},

**x**

_{j}) gives rise to a kernel matrix obeying the Mercer condition.

Support Vector Machine (SVM) is the best known example of kernel method. This supervised binary classifier learns the optimal discriminative hyperplane, based on an input set of *M* labelled vectors \(\{ (\mathbf {x},y) | \mathbf {x} \in \mathbb {R}^{N}, y \in \left \{ -1,+1 \right \} \} \). This is achieved by maximizing the distance, i.e., the margin, between the decision hyperplane and the closest points, called support vectors (Cortes and Vapnik 1995).

*x*

_{i},

*y*

_{i}), with

*i*= 1…

*M*and \(y_{i} \in \left \{ -1,+1 \right \} \), is the pair of training vector and label,

**w**is the vector which is normal to the discriminative hyperplane, and

*b*is the offset of the hyperplane.

*soft margin*SVM, where the best hyperplane is the one that reaches the optimal trade-off between two factors: the minimization of the margin and the restraint of the point deviation from the margin; the latter is expressed by means of slack variables

*ξ*

_{i}tuned by a hyper-parameter

*C*. A soft margin SVM optimization problem is of the form

*α*

_{i}are introduced in order to include the constraint in the objective function, by obtaining the formulation:

*α*

_{i}s are non-zero and that the corresponding

**x**

_{i}are the support vectors which lie on the margin and determine the discriminant hyperplane.

*K*(

**x**

_{i},

**x**

_{j}) ≡

**ϕ**(

*x*

_{i})

^{T}(

**ϕ**(

*x*

_{j})) satisfying the Mercer condition of positive semi-definiteness. The Lagrangian optimization problem for the soft margin SVM now becomes

Note that the dual form of the SVM optimization problem is quadratic in the parameter *α*_{i} and it can be efficiently solved with quadratic programming algorithms.

*e*

_{i}are errors terms. In this way, optimal parameters

*and*

**α***b*that identify the decision hyperplane are found by solving a set of linear equations, instead of using quadratic programming.

*F*is a (

*M*+ 1) × (

*M*+ 1) matrix,

**1**

^{T}≡ (1,1,1…)

^{T},

*K*is the kernel matrix and

*γ*

^{− 1}is the trade-off parameter that plays a similar role to

*C*in soft margin SVM. Binary class labels are denoted by the vector \(\mathbf {y} \in (\left [ -1,1 \right ]^{M})^{T}\).

Solving the quadratic programming problem or the least-squares SVM has complexity *O*(*M*^{3}) (Wittek 2014). A bottleneck slowing down the computation is determined by the kernel: for a polynomial kernel *K*(**x**_{i}, **x**_{j}) of the form \((\mathbf {x_{i}}^{T} \mathbf {x_{j}} +c)^{d}\), the best algorithm takes *O*(*M*^{2}*d*), although in other cases the complexity could be much higher, e.g., for those kernels depending on a distance whose calculation is itself an NP problem.

## 3 Quantum SVM

The first quantum approach to SVM is due to Anguita et al. (2003). In their work, they consider a discretized version of the SVM, which also takes into account the generalization error of the classifier. This setting inhibits the use of well-known quadratic programming algorithms and optimization can turn into a problem in the NP complexity class.

The authors propose to represent different configurations of the Lagrangian multipliers, *α*_{i}, as quantum states \( \left | \alpha _{0} \alpha _{1} .. \alpha _{M}\right \rangle \), and then use Grover algorithm in order to perform an exhaustive search over the configuration space in order to find the maximum of the cost function. It is well known that this task can be accomplished by the Grover quantum algorithm with complexity \( O(\sqrt {2^{M}}) \) rather than the *O*(2^{M}) required by classical algorithms.

**x**are represented by means of quantum states of the form

**x**are encoded in the amplitude of the quantum state. The authors claim that this whole set of

*M*states could in principle be constructed querying a Quantum Random Access Memory (QRAM), which uses

*O*(

*M*

*N*) hardware resources but only \( O(\log MN) \) operations to access them (Giovannetti et al. 2008).

The preliminary step of the QSVM algorithm exploits the fact that dot products can be estimated faster using the QRAM and repeating the SWAP test algorithm on a quantum computer (Buhrman et al. 2001). More precisely, if the desired accuracy is *𝜖*, then the overall complexity of evaluating a single dot product \( \mathbf {x_{i}}^{T} \mathbf {x_{j}} \) is \( O(\epsilon ^{-1} \log N) \). Calculating the kernel matrix takes therefore \( O(M^{2} \epsilon ^{-1} \log N) \), instead of \( O(M^{2}N \log (1/\epsilon )) \) required in the classical case.

*F*||≤ 1. Then the optimal parameters

*b*and

*are obtained by applying the efficient quantum matrix inversion algorithm (Harrow et al. 2009). This algorithm requires the simulation of matrix exponentials \(e^{-i\hat {F} {\Delta } t }\), which can be performed in \( O(\log N) \) steps (Lloyd et al. 2014). Moreover, we can add an ancillary qubit, initially in state \( \left | 0\right \rangle \), and use the quantum phase estimation algorithm (Nielsen and Chuang 2011) to express the state \( \left | \mathbf {y} \right \rangle \) in the eigenbasis \( \left | e_{i}\right \rangle \) of \( \hat {F} \) and store approximations of the eigenvalues*

**α***λ*

_{i}of \( \hat {F} \) in the ancilla qubit:

The process of classifying new data \( \left | \mathbf {x}\right \rangle \) with trained \( \left . \right | \boldsymbol {\alpha }, \beta \rangle \) requires the implementation of the *query oracle*

*P*can be estimated to accuracy

*𝜖*in \(O(\frac {P(1-P)}{\epsilon ^{2}})\). The class label is decided depending on the value of

*P*: if it is greater than \(\frac {1}{2}\), then \( \left | \mathbf {x}\right \rangle \) is labelled − 1; if it is less than \(\frac {1}{2}\), then the label of \( \left | \mathbf {x} \right \rangle \) is 1.

The overall time complexity for both training and classification of the LS-SVM is of *O*(\( \log (NM) \)).

**x**

_{i}〉 to a d-fold tensor product

An experimental implementation of the QSVM have been shown in Li et al. (2015) and Patrick et al. (2018). Also, in Windridge et al. (2018), the authors propose a quantized version of Error Correction Output Codes (ECOC) which extends the QSVM algorithm to the multi-class case and enables it to perform an error correction on the label allocation.

## 4 Quantum computation of hard kernels

*U*(

*𝜃*) that depends on a set of parameters

*𝜃*which are varied in order to minimize a given objective function (see Fig. 2). The quantum circuit is hence trained by a classical iterative optimization algorithm that at every step finds best candidates

*𝜃*starting from random (or pre-trained) initial values.

Schuld and Killoran recently explored this concepts (Schuld and Killoran 2019) remarking the strict relation between quantum states and feature maps. The authors explain that the key element in both quantum computing and kernel methods is to perform computations in a high dimensional (possibly infinite) Hilbert space via an efficient manipulation of inputs.

In fact it is possible to interpret the encoding of classical inputs **x**_{i} into a quantum state |*ϕ*(**x**)〉 as a feature map *ϕ* which maps classical vectors to the Hilbert space associated with a system of qubits. As said before, two ways of exploiting this parallelism are described.

*implicit*, a quantum device takes classical input and evaluates a kernel function as part of a hybrid classification model. This requires the use of a quantum circuit

*U*

_{ϕ}(

*x*) implementing the mapping

In order for quantum computing to be helpful, such kernel shouldn’t be efficiently simulated by a classical computer. It is therefore posed the question of what type of feature map circuits *U*_{ϕ} leads to powerful kernels for classical learning models like SVM but at the same time are classically intractable. The authors suggest that a way to achieve such a goal is to employ non-Gaussian elements (e.g., cubic phase gate or photon number measurements) as part of the quantum circuit *U*_{ϕ}(*x*) implementing the mapping to the feature space.

*explicit*, uses a variational quantum circuit to directly learn a decision boundary in the quantum Hilbert space. In their example, the authors first translate classical input to a quantum squeezed state

*ϕ*(

**x**)〉 the parametrized continuous-variable circuit: where

*W*(

*𝜃*) is a repetition of the following gates: The components of such gates are, more explicitly,

*z*and finally the quadratic and cubic phase gates

*n*

_{1},

*n*

_{2}〉 in the state |2,0〉 or |0,2〉 is interpreted as the probability that the classifier predicts class

*y*= 0 or

*y*= 1

Along the same path, simultaneously to Schuld and Killoran (2019), Havlicek et al. (2019) propose two classifiers that map classical data into quantum feature Hilbert space in order to get a quantum advantage. Again, one SVM classifier is based on a variational circuit that generates a separating hyperplane in the quantum feature space ,while the other classifier only estimates the kernel function on the quantum computer.

The two methods are tested on an artificial dataset **x** ∈ *T* ∪ *S* ≡*Ω*⊂ (0,2*π*]^{2} where *T* and *S* are respectively the training and test sets. This classical input is previously encoded as \(\phi _{S}(\mathbf {x}) \in \mathbb {R}\) where *ϕ*_{S}(**x**) = (*π* − *x*_{1})(*π* − *x*_{2}).

*n*-qubits generated by the unitary

*H*is the Hadamard gate and

*Z*

_{k}being the phase shift gate of angle

*k*and

*S*the test set. Such circuit acts on \(\left | {0}\right \rangle ^n\) as initial state and uses classical data previously encoded in

*ϕ*

_{S}(

**x**).

The exact classical evaluation of the inner-product (i.e., kernel) between two states obtained using a circuit *U*_{Φ}(**x**) is *#**P* - hard because it is associate to a Tutte partition function which is hard to simulate classically (Goldberg and Guo 2017).

A different approach is taken in Di Pierro et al. (2017), where the same idea of using quantum computation to evaluate a kernel is discussed in the context of Topological Quantum Computation (TQC).

TQC represent a model of quantum computing polynomially equivalent to the circuit based where, instead of using qubits and gates, the computation is performed braiding two-dimensional quasi particles called anyons (Pachos 2012). Moreover, it is well known that some computational problems, such as the approximation of the Jones Polynomial, i.e., an invariant of links and knots, have a more straightforward implementation in TQC (Aharonov et al. 2006).

**x**in the form of binary strings into braids, which in TQC are expressed by means of evolution operators

**B**. This encoding is constructed by mapping the bit value 0 to the crossing operator

*σ*

_{i}, and the bit value 1 to the adjoint crossing operator \(\sigma _{i^{\dagger }}\):

*n*is uniquely represented by a pairwise braiding of 2

*n*strands, i.e., by a braid

*B*∈

*B*

_{2n}as shown below.

**B**

_{u}associated with the binary string

*u*to the vacuum state of the anyonic quantum system \( \left | \psi \right \rangle \) defines an embedding

*ϕ*into the Hilbert space \( {\mathscr{H}} \) of the anyonic configurations:

*A*that is associated to the so called Hopf link,

*d*=

*A*

^{2}+

*A*

^{− 2}and

*d*

_{H}(

*u*,

*v*) is the hamming distance between input strings

*u*and

*v*.

Despite this example does not provide a computationally hard kernel, the authors suggest that a more complex braid mapping of the input may lead naturally to a classically intractable kernel since the calculation of Kaufman polynomial belongs to the *#**P* - hard class (Goldberg and Guo 2017).

## 5 Conclusion

Rundown of the main results and references

Category | Method | Title |
---|---|---|

Quantum version of SVM | Grover algorithm | Quantum optimization for training support vector machines (Anguita et al. 2003) |

Quantum version of SVM | HHL algorithm | Quantum support vector machine for big data classification (Rebentrost et al. 2014) |

Experimental | NMR 4-qubit quantum processor | Experimental implementation of a quantum support vector machine (Li et al. 2015) |

Experimental | IBM quantum experience | Quantum algorithm Implementations for beginners (Patrick et al. 2018) |

Quantum version of SVM and ECOC | HHL algorithm | Quantum error-correcting output codes (Windridge et al. 2018) ] |

Kernel methods | Variational quantum circuit | Quantum machine learning in feature Hilbert spaces (Schuld and Killoran 2019) |

Kernel methods | Variational quantum circuit | Supervised learning with quantum-enhanced feature spaces (Havlicek et al. 2019) |

Kernel methods | Topological quantum computation | Hamming distance kernelisation via topological quantum computation (Di Pierro et al. 2017) |

## Notes

## References

- Agresti I, et al. (2019) Pattern recognition techniques for boson sampling validation. Phys Rev X 9:14Google Scholar
- Aharonov D, Jones V, Landau Z (2006) A polynomial quantum algorithm for approximating the Jones polynomial. In: Proceedings of the 38th annual ACM symposium on theory of computing, pp 427–436Google Scholar
- Aïmeur, et al. (2013) Quantum speed-up for unsupervised learning. Mach Learn 90:261–287MathSciNetzbMATHCrossRefGoogle Scholar
- Amin MH, et al. (2018) Quantum Boltzmann machine. Phys Rev X 8:11Google Scholar
- Anguita D, et al. (2003) Quantum optimization for training support vector machines. Neural Netw 16:763–770CrossRefGoogle Scholar
- Arunachalam S, Wolf Ronald de (2017) A survey of quantum learning theory, arXiv:1701.06806
- Barry J, et al. (2014) Quantum partially observable Markov decision processes. Phys Rev A 90:032311CrossRefGoogle Scholar
- Benedetti M, et al. (2019) Adversarial quantum circuit learning for pure state approximation. New J Phys 21:043023MathSciNetCrossRefGoogle Scholar
- Biamonte J, et al. (2017) Quantum machine learning. Nature 549:195–202CrossRefGoogle Scholar
- Bishop C (2016) Pattern recognition and machine learning, vol 738. Springer, New YorkGoogle Scholar
- Bottarelli L, et al. (2018) Biclustering with a quantum annealer. Soft Comput 22:6247–6260zbMATHCrossRefGoogle Scholar
- Buhrman H, Cleve R, Watrous J, De Wolf R (2001) Quantum fingerprinting. Phys Rev Lett 87:4CrossRefGoogle Scholar
- Canabarro A, Fernandes Fanchini F, Malvezzi AL, Pereira R, Chaves R (2019) Unveiling phase transitions with machine learning. arXiv:1904.01486
- Ciliberto C, et al. (2018) Quantum machine learning: a classical perspective. Proc R Soc A: Math Phys Eng Sci 474:20170551MathSciNetzbMATHCrossRefGoogle Scholar
- Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297zbMATHGoogle Scholar
- Crawford D, et al. (2016) Reinforcement learning using quantum Boltzmann machines, arXiv:1612.05695
- Di Pierro A, et al. (2017) Distance kernelisation via topological quantum computation theory and practice of natural computing. Lect Notes Comput Sci 10687:269–280CrossRefGoogle Scholar
- Di Pierro A, et al. (2018) Homological analysis of multi-qubit entanglement. Europhys Lett 123:30006CrossRefGoogle Scholar
- Dong XY, Pollmann F, Zhang XF (2019) Machine learning of quantum phase transitions. Phys Rev B 99:121104CrossRefGoogle Scholar
- Dunjko V, Briegel HJ (2018) Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Rep Prog Phys 81:074001MathSciNetCrossRefGoogle Scholar
- Dunjko V, et al. (2016) Quantum-enhanced machine learning. Phys Rev Lett 117:6MathSciNetCrossRefGoogle Scholar
- Giovannetti V, Lloyd S, Maccone L (2008) Quantum random access memory. Phys Rev Lett 100:4MathSciNetzbMATHGoogle Scholar
- Goldberg LA, Guo H (2017) The complexity of approximating complex-valued ising and tutte partition functions. Computational Complexity 26:765–833MathSciNetzbMATHCrossRefGoogle Scholar
- Gray J, et al. (2018) Machine-learning-assisted many-body entanglement measurement. Phys Rev Lett 121:6CrossRefGoogle Scholar
- Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for linear systems of equations. Phys Rev Lett 103:4MathSciNetCrossRefGoogle Scholar
- Havlicek V, Córcoles AD, et al. (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567:2019–212CrossRefGoogle Scholar
- Heim B, et al. (2015) Quantum versus classical annealing of ising spin glasses. Science 348:215–217MathSciNetzbMATHCrossRefGoogle Scholar
- Huembeli P, et al. (2019) Automated discovery of characteristic features of phase transitions in many-body localization. Phys Rev B 99:6CrossRefGoogle Scholar
- Iten R, et al. (2018) Discovering physical concepts with neural networks, arXiv:1807.10300
- Kauffman LH (1987) State models and the Jones polynomial. Topology 26:395–407MathSciNetzbMATHCrossRefGoogle Scholar
- Levine Y, et al. (2018) Deep learning and quantum entanglement: fundamental connections with implications to network design. In: International conference on learning representationsGoogle Scholar
- Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum support vector machine. Phys Rev Lett 114:5Google Scholar
- Lloyd S, Mohseni M, Rebentrost P (2014) Quantum principal component analysis. Nat Phys 10:631–633CrossRefGoogle Scholar
- Lu S, Braunstein SL (2014) Quantum decision tree classifier. Quantum Inf Process 13:757–770MathSciNetzbMATHCrossRefGoogle Scholar
- Mcclean JR, Romero J, Babbush R, Aspuru-Guzik A (2016) The theory of variational hybrid quantum-classical algorithms. New J Phys 18:023023CrossRefGoogle Scholar
- Mercer J, et al. (1909) Functions of positive and negative type and their connection the theory of integral equations, 209 Philosophical Transactions of the Royal Society of LondonGoogle Scholar
- Mikhail V, et al. (2016) Altaisky towards a feasible implementation of quantum neural networks using quantum dots. Appl Phys Lett 108:103108CrossRefGoogle Scholar
- Mitchell T (1997) Machine learning. McGraw Hill, New YorkzbMATHGoogle Scholar
- Mohri M, et al. (2012) Foundations of machine learning, vol 432. MIT Press, CambridgeGoogle Scholar
- Nielsen MA, Chuang IL (2011) Quantum computation and quantum information. Cambridge University Press, New YorkzbMATHGoogle Scholar
- O’Driscoll L, et al. (2019) A hybrid machine learning algorithm for designing quantum experiments. Quantum Mach Intell 1:1–11CrossRefGoogle Scholar
- Pachos JK (2012) Introduction to topological quantum computation. Cambridge University Press, New YorkzbMATHCrossRefGoogle Scholar
- Patrick J, et al. (2018) Coles quantum algorithm implementations for beginners, arXiv:1804.03719
- Perdomo-Ortiz A, et al. (2018) Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers. Quantum Sci Technol 3:030502CrossRefGoogle Scholar
- Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113:5CrossRefGoogle Scholar
- Schuld M, Killoran N (2019) Quantum machine learning in feature Hilbert spaces. Phys Rev Lett 122:6CrossRefGoogle Scholar
- Schuld M, Petruccione F (2018) Supervised learning with quantum computers, vol 287. Springer International Publishing, BerlinzbMATHCrossRefGoogle Scholar
- Schuld M, Sinayskiy I, Petruccione F (2015) An introduction to quantum machine learning. Contemp Phys 56(2):172–185zbMATHCrossRefGoogle Scholar
- Sergioli G, et al. (2018) A quantum-inspired version of the nearest mean classifier. Soft Comput 22:691–705zbMATHCrossRefGoogle Scholar
- Stoudenmire E, Schwab DJ (2016) Supervised learning with tensor networks. Advances in neural information processing systems (NIPS Proceedings) 29:4799–4807Google Scholar
- Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300CrossRefGoogle Scholar
- Theodoridis S (2008) Pattern recognition, vol 984. Elsevier Academic Press, CambridgeGoogle Scholar
- Wiebe N, et al. (2015) Quantum algorithms for nearest-neighbours methods for supervised and unsupervised learning. Quantum Info Comput 15:316–356MathSciNetGoogle Scholar
- Windridge D, Mengoni R, Nagarajan R (2018) Quantum error-correcting output codes. Int J Quantum Info 16:1840003zbMATHCrossRefGoogle Scholar
- Wittek P (2014) Quantum machine learning, vol 176. Elsevier Academic Press, CambridgezbMATHGoogle Scholar
- Yu S, Albarrán-Arriagada F, Retamal JC, Wang YT, Liu W, Ke ZJ, Meng Y, Li ZP, Tang JS, Solano E, Lamata L, Li CF, Guo GC (2019) . Adv Quantum Technol 2(7-8):1800074CrossRefGoogle Scholar