Advertisement

Kernel methods in Quantum Machine Learning

  • Riccardo MengoniEmail author
  • Alessandra Di Pierro
Review Article
  • 310 Downloads

Abstract

Quantum Machine Learning has established itself as one of the most promising applications of quantum computers and Noisy Intermediate Scale Quantum (NISQ) devices. In this paper, we review the latest developments regarding the usage of quantum computing for a particular class of machine learning algorithms known as kernel methods.

Keywords

Quantum Machine Learning Quantum computing Kernel methods 

1 Introduction

In the era of big data, Machine Learning (ML) provides a set of techniques to identify patterns among huge datasets “without being explicitly programmed to perform that task” (Bishop 2016; Mitchell 1997). In the last few years, building on the great success of ML, a new interdisciplinary research topic going under the name of Quantum Machine Learning (QML) has emerged (Schuld 2015; Wittek 2014; Biamonte et al. 2017; Ciliberto et al. 2018; Dunjko and Briegel 2018; Arunachalam and Wolf 2017; Perdomo-Ortiz et al. 2018; Schuld and Petruccione 2018). The aim of QML is to merge in different ways quantum computing and data mining techniques in order to achieve improvements in both fields. As shown in Fig. 1, it is possible to distinguish four approaches to QML, depending on the nature of the dataset under study and the computation device being used (Dunjko et al. 2016).
Fig. 1

The first letter in each box refers to whether the system under study is classical or quantum, while the second letter indicates whether a classical or quantum information processing device is used

The Classical-Classical (CC) class refers to ordinary machine learning or to machine learning algorithms that are inspired by the formalism of quantum mechanics. Here the dataset represents some classical system and the algorithm can run on a classical computer (Dong et al. 2019; Canabarro et al. 2019; Amin et al. 2018; Crawford et al. 2016; Stoudenmire and Schwab 2016; Sergioli et al. 2018; Levine et al. 2018). In the Classical-Quantum (CQ) class, algorithms rely on the advantages of quantum computation in order to speed up classical ML methods. Data are assumed to be classical in this class as well (Aïmeur et al. 2013; Mikhail et al. 2016; Wiebe et al. 2015; Barry et al. 2014; Lu and Braunstein 2014; Heim et al. 2015; Bottarelli et al. 2018). Quantum-Classical (QC) refers to the use of classical ML methods to analyse quantum systems (Agresti et al. 2019; Huembeli et al. 2019; Gray et al. 2018; Benedetti et al. 2019; Di Pierro et al. 2018; O’Driscoll et al. 2019; Iten et al. 2018). Finally, in the Quantum-Quantum (QQ) class, both the learning algorithm and the system under study are fully quantum (Yu et al. 2019).

Some very promising results have been obtained relatively to each of the four frameworks. In this paper, we have chosen to focus on the CQ section with the aim to review the main approaches that use quantum mechanics in order to obtain a computational advantage for a specific class of ML techniques called kernel methods. Our main motivation is to set a clear background for those who want to start investigations or carry out research in this field. A systematization of the current research in Quantum Machine Learning should include similar work in the other three sectors too, which we plan to accomplish in the future.

In the next section we will introduce kernel methods, with a particular attention to the Support Vector Machine (SVM) supervised learning model. Then, we will discuss the two main approaches to quantizing these methods. We have divided this discussion in two sections. In Section 3 we have collected those approaches which are aimed at the formulation of a quantum algorithm that implements a quantum version of the classical SVM. The second type of approaches is discussed in Section 4 and is aimed at exploiting the power of quantum computing to deal specifically with classically intractable kernels.

2 Kernel methods and SVM

Kernel methods (Theodoridis 2008) are classification algorithms that use a kernel function K in order to map data points, living in the input space V, to a higher dimensional feature space \(V^{\prime } \), where separability between classes of data becomes clearer. Kernel methods avoid the explicit calculation of the point coordinates in the new space by means of so called kernel trick, which allows us to work in the feature space \(V^{\prime } \) simply computing the kernel of pairs of data points in the input space (Theodoridis 2008).

Intuitively, the “trick” consists in considering the following scenario. Let \(\mathbf {\phi }: V\rightarrow V^{\prime } \), be a map from the input space V to the enhanced feature space \(V^{\prime }\). Then a kernel \(K: V\times V\rightarrow \mathbb {R} \) is a function
$$ K(\mathbf{x}_{i}, \mathbf{x}_{j})\equiv \left\langle \mathbf{\phi}(\mathbf{x_{i}} ),\mathbf{\phi}(\mathbf{x_{j}} )\right\rangle, $$
representing the inner product \(\left \langle \cdot ,\cdot \right \rangle \) in \( V^{\prime } \), that must satisfy the Mercer condition (Mercer et al. 1909; Mohri et al. 2012) of positive semi-definiteness, i.e., for all choices of n real numbers \( (c_{1},{\dots } ,c_{n}) \) the following relation must hold
$$ \sum\limits_{i=1}^{M}\sum\limits_{j=1}^{M} K(\mathbf{x}_{i},\mathbf{x}_{j})c_{i}c_{j}\geq 0. $$
Clearly, calculating the kernel K(xi, xj) is computationally cheaper than computing coordinates for each new point ϕ(x), and, on the other hand, we are never required to explicitly compute ϕ(xi) at any stage of the algorithm. The existence of a concrete mapping \( \mathbf {\phi }: V\rightarrow V^{\prime } \) is guaranteed by the Mercer theorem (Mercer et al. 1909; Mohri et al. 2012), provided that the kernel function K(xi, xj) gives rise to a kernel matrix obeying the Mercer condition.

Support Vector Machine (SVM) is the best known example of kernel method. This supervised binary classifier learns the optimal discriminative hyperplane, based on an input set of M labelled vectors \(\{ (\mathbf {x},y) | \mathbf {x} \in \mathbb {R}^{N}, y \in \left \{ -1,+1 \right \} \} \). This is achieved by maximizing the distance, i.e., the margin, between the decision hyperplane and the closest points, called support vectors (Cortes and Vapnik 1995).

The SVM optimization problem with hard-margin can be formulated as the problem to find
$$ \arg\!\min\limits_{(\mathbf{w},b)} \left\{\!\frac{1}{2} \|\mathbf{w}\|^{2} \!\right\},\!\!\! \text{subject{\kern-.5pt} to{\kern-.5pt} the{\kern-.5pt} constraint } \forall_{i} y_{i}(\mathbf{w} \cdot\mathbf{x_{i}} - b) \!\ge\!\! 1, $$
where (xi, yi), with i = 1…M and \(y_{i} \in \left \{ -1,+1 \right \} \), is the pair of training vector and label, w is the vector which is normal to the discriminative hyperplane, and b is the offset of the hyperplane.
An important extension of the SVM method described above is the so called soft margin SVM, where the best hyperplane is the one that reaches the optimal trade-off between two factors: the minimization of the margin and the restraint of the point deviation from the margin; the latter is expressed by means of slack variables ξi tuned by a hyper-parameter C. A soft margin SVM optimization problem is of the form
$$ \arg\min\limits_{(\mathbf{w},b)} \left\{\frac{1}{2} \|\mathbf{w}\|^{2} + C \sum\limits_{i=1}^{M} \xi_{i} \right\}, $$
subject to the constraint
$$ \forall_{i} y_{i}(\mathbf{w} \cdot\mathbf{x_{i}} - b) \ge 1 - \xi_{i}, \xi_{i} \ge 0. $$
(1)
Usually it is convenient to switch to the dual form, where Lagrange multipliers αi are introduced in order to include the constraint in the objective function, by obtaining the formulation:
$$ \arg\max\limits_{(\alpha_{i})} \sum\limits_{i=1}^{M} \!\!\alpha_{i} - \frac{1}{2}\sum\limits_{i, j} \alpha_{i} \alpha_{j} y_{i} y_{j} (\mathbf{x}_{i}^{T} \mathbf{x}_{j}),\!\! $$
with \(\mathbf {w} = \sum \limits _{i} \alpha _{i} y_{i} \mathbf {x}_{i}\), subject to \( \sum \limits _{i} \alpha _{i} y_{i} = 0 , \forall _{i} \alpha _{i} \geq 0.\)It is worth noticing that only a sparse subset of the αis are non-zero and that the corresponding xi are the support vectors which lie on the margin and determine the discriminant hyperplane.
In this context, a non-linear classification boundary for the SVM is obtained by replacing the term \((\mathbf {x}_{i}^{T} \mathbf {x}_{j})\) in the objective function with a kernel function K(xi, xj) ≡ϕ(xi)T(ϕ(xj)) satisfying the Mercer condition of positive semi-definiteness. The Lagrangian optimization problem for the soft margin SVM now becomes
$$ \arg\max\limits_{(\alpha_{i})} \sum\limits_{i=1}^{M} \alpha_{i} - \frac{1}{2}\sum\limits_{i, j} \alpha_{i} \alpha_{j} y_{i} y_{j} K(\mathbf{x}_{i}, \mathbf{x}_{j}), $$
subject to \({\sum }_{i} \alpha _{i} y_{i} = 0 \text {with} \forall _{i} \alpha _{i} \geq 0\).

Note that the dual form of the SVM optimization problem is quadratic in the parameter αi and it can be efficiently solved with quadratic programming algorithms.

An alternative version of SVM that has a central role in the quantum formulation of the problem is the least-squares support vector machines (LS-SVM) (Suykens and Vandewalle 1999). Here, the constraint defined in Eq. 1 is replaced by the equality constraint
$$ \forall_{i} y_{i}(\mathbf{w} \cdot \mathbf{\phi}(\mathbf{x_{i} })- b) = 1 - e_{i}, $$
where ei are errors terms. In this way, optimal parameters α and b that identify the decision hyperplane are found by solving a set of linear equations, instead of using quadratic programming.
The LS-SVM problem can hence be formulated as
$$ F\left( \begin{array}{c} b \\ \boldsymbol{\alpha} \end{array} \right) = \left( \begin{array}{cc} 0 & \mathbf{1}^{T} \\ \mathbf{1} & K + \gamma^{-1} I \end{array} \right) \left( \begin{array}{c} b \\ \boldsymbol{\alpha} \end{array} \right) = \left( \begin{array}{c} 0 \\ \mathbf{y} \end{array} \right), $$
(2)
where F is a (M + 1) × (M + 1) matrix, 1T ≡ (1,1,1…)T, K is the kernel matrix and γ− 1 is the trade-off parameter that plays a similar role to C in soft margin SVM. Binary class labels are denoted by the vector \(\mathbf {y} \in (\left [ -1,1 \right ]^{M})^{T}\).

Solving the quadratic programming problem or the least-squares SVM has complexity O(M3) (Wittek 2014). A bottleneck slowing down the computation is determined by the kernel: for a polynomial kernel K(xi, xj) of the form \((\mathbf {x_{i}}^{T} \mathbf {x_{j}} +c)^{d}\), the best algorithm takes O(M2d), although in other cases the complexity could be much higher, e.g., for those kernels depending on a distance whose calculation is itself an NP problem.

3 Quantum SVM

The first quantum approach to SVM is due to Anguita et al. (2003). In their work, they consider a discretized version of the SVM, which also takes into account the generalization error of the classifier. This setting inhibits the use of well-known quadratic programming algorithms and optimization can turn into a problem in the NP complexity class.

The authors propose to represent different configurations of the Lagrangian multipliers, αi, as quantum states \( \left | \alpha _{0} \alpha _{1} .. \alpha _{M}\right \rangle \), and then use Grover algorithm in order to perform an exhaustive search over the configuration space in order to find the maximum of the cost function. It is well known that this task can be accomplished by the Grover quantum algorithm with complexity \( O(\sqrt {2^{M}}) \) rather than the O(2M) required by classical algorithms.

A different approach was proposed by Rebentrost, Mohseni and Lloyd (Rebentrost et al. 2014), which presented a completely new quantum algorithm that implements SVM on a circuit-based quantum computer. This formulation has become very popular in the last few years and it is often referred to as the Quantum SVM (QSVM) algorithm. In order to understand QSVM it is necessary to clarify that classical input training vectors x are represented by means of quantum states of the form
$$ \left.\right| \mathbf{x} \rangle= \frac{1}{\left| \mathbf{x}\right| } \sum\limits_{k=1}^{N} (\mathbf{x})_{k} \left| k\right\rangle, $$
where the components of the vectors x are encoded in the amplitude of the quantum state. The authors claim that this whole set of M states could in principle be constructed querying a Quantum Random Access Memory (QRAM), which uses O(MN) hardware resources but only \( O(\log MN) \) operations to access them (Giovannetti et al. 2008).

The preliminary step of the QSVM algorithm exploits the fact that dot products can be estimated faster using the QRAM and repeating the SWAP test algorithm on a quantum computer (Buhrman et al. 2001). More precisely, if the desired accuracy is 𝜖, then the overall complexity of evaluating a single dot product \( \mathbf {x_{i}}^{T} \mathbf {x_{j}} \) is \( O(\epsilon ^{-1} \log N) \). Calculating the kernel matrix takes therefore \( O(M^{2} \epsilon ^{-1} \log N) \), instead of \( O(M^{2}N \log (1/\epsilon )) \) required in the classical case.

The main idea of the QSVM algorithm is to use the LS-SVM formulation of Eq. 2 and rewrite it in terms of quantum states as
$$ \hat{F}\left|b ,\boldsymbol{\alpha} \right\rangle =\left| \mathbf{y} \right\rangle, $$
where \( \hat {F}=F/\text {tr}(F)\), with ||F||≤ 1. Then the optimal parameters b and α are obtained by applying the efficient quantum matrix inversion algorithm (Harrow et al. 2009). This algorithm requires the simulation of matrix exponentials \(e^{-i\hat {F} {\Delta } t }\), which can be performed in \( O(\log N) \) steps (Lloyd et al. 2014). Moreover, we can add an ancillary qubit, initially in state \( \left | 0\right \rangle \), and use the quantum phase estimation algorithm (Nielsen and Chuang 2011) to express the state \( \left | \mathbf {y} \right \rangle \) in the eigenbasis \( \left | e_{i}\right \rangle \) of \( \hat {F} \) and store approximations of the eigenvalues λi of \( \hat {F} \) in the ancilla qubit:
$$ \left. \right| \mathbf{y} \rangle \left. \right| 0 \rangle \to \sum\limits_{i=1}^{M+1} \langle e_{i} \left. \right| \mathbf{y} \rangle \left. \right| e_{i} \rangle \left. \right| \lambda_{i} \rangle. $$
Now apply an inversion of the eigenvalue with a controlled rotation and un-compute the eigenvalue qubit to obtain
$$ \sum\limits_{i=1}^{M+1} \frac{ \langle e_{i} \left. \right| \mathbf{y} \rangle }{ \lambda_{i} } \left. \right| e_{i} \rangle =\hat{F}^{-1} \left. \right| \mathbf{y} \rangle =\left|b ,\boldsymbol{\alpha} \right\rangle. $$
In the training set basis, the solution state for the LS-SVM is
$$ \left|b ,\boldsymbol{\alpha} \right\rangle = \frac{1}{b^{2}+{\sum}_{k=1}^{M} {\alpha_{k}^{2}}} \left( b \left.\right| 0 \rangle +\sum\limits_{k=1}^{M} \alpha_{k} \left.\right| k \rangle \right). $$

The process of classifying new data \( \left | \mathbf {x}\right \rangle \) with trained \( \left . \right | \boldsymbol {\alpha }, \beta \rangle \) requires the implementation of the query oracle

$$ \left.\right| \tilde{u} \rangle = \frac{1}{\left( b^{2}+{\sum}_{k=1}^{M} {\alpha_{k}^{2}} | \mathbf{x_{k}} |^{2} \right)^{\frac{1}{2}} }\left( \! b \left. \right| 0 \rangle \left.\right| 0 \rangle + \sum\limits_{k=1}^{M} | \mathbf{x_{k}} | \alpha_{k} \left.\right| k \rangle \left.\right| \mathbf{x_{k}} \rangle\! \right) $$
(3)
and also the query state
$$ \left.\right| \tilde{x} \rangle = \frac{1}{M | \mathbf{x} |^{2} +1 } \left( \left.\right| 0 \rangle\left. \right| 0 \rangle + \sum\limits_{k=1}^{M} | \mathbf{x} | \left.\right| k \rangle \left. \right| \mathbf{x} \rangle \right). $$
(4)
The classification is obtained by computing the inner product \( \left \langle \tilde {x} |\tilde {u} \right \rangle \) via a swap test (Buhrman et al. 2001). This means that, with the help of an ancillary qubit, the state \(\left |\psi \right \rangle =\frac {1}{\sqrt {2}} (\left . \right | 0 \rangle _{a}\left . \right | \tilde {u} \rangle + \left . \right | 1 \rangle _{a} \left .\right | \tilde {x} \rangle )\) is constructed and then measured in the state \(\left |\phi \right \rangle =\frac {1}{\sqrt {2}} (\left .\right | 0 \rangle _{a} - \left .\right | 1 \rangle _{a})\) with a success probability given by \( P= \left | \left \langle \psi |\phi \right \rangle \right |^{2}=\frac {1}{2}\left (1-\left \langle \tilde {x} |\tilde {u} \right \rangle \right ) \). The probability P can be estimated to accuracy 𝜖 in \(O(\frac {P(1-P)}{\epsilon ^{2}})\). The class label is decided depending on the value of P: if it is greater than \(\frac {1}{2}\), then \( \left | \mathbf {x}\right \rangle \) is labelled − 1; if it is less than \(\frac {1}{2}\), then the label of \( \left | \mathbf {x} \right \rangle \) is 1.

The overall time complexity for both training and classification of the LS-SVM is of O(\( \log (NM) \)).

In the QSVM algorithm, kernelization can be achieved by acting on the training vector basis, i.e., by mapping each |xi〉 to a d-fold tensor product
$$ | \phi(\mathbf{x}_{i}) \rangle =| \mathbf{x}_{i} \rangle_{1} \otimes| \mathbf{x}_{i} \rangle_{2} \otimes ... \otimes| \mathbf{x}_{i} \rangle_{d}. $$
This allows us to obtain polynomial kernels of the form
$$ K(\left\langle \mathbf{x}_{i} | \mathbf{x}_{j} \right\rangle )\equiv \left\langle \phi(\mathbf{x}_{i}) | \phi(\mathbf{x}_{j})\right\rangle = \left\langle \mathbf{x}_{i} | \mathbf{x}_{j} \right\rangle^{d} $$
that can be computed in \( O(d \epsilon ^{-1} \log N) \). Note that in the QSVM, the kernel evaluation is directly performed in the high dimensional feature quantum space, while in classical SVM the kernel trick avoids such expensive calculation. However, this is no problem in the quantum case thanks to the exponential quantum speed-up obtained in the evaluation of inner products.

An experimental implementation of the QSVM have been shown in Li et al. (2015) and Patrick et al. (2018). Also, in Windridge et al. (2018), the authors propose a quantized version of Error Correction Output Codes (ECOC) which extends the QSVM algorithm to the multi-class case and enables it to perform an error correction on the label allocation.

4 Quantum computation of hard kernels

In this section we review the main proposals having as a core idea the computation of classically hard kernel via a quantum device. In this context, we can recognize two common threads. On one side, a hybrid classical-quantum learning model takes classical input and evaluates a kernel function on a quantum devices, while classification is performed in the standard classical manner (e.g employing a SVM algorithm). In the second approach instead, a kernel based variational quantum circuit is trained to classify input data. More specifically, a variational quantum circuit (Mcclean et al. 2016) is a hybrid quantum-classical algorithm employing a quantum circuit U(𝜃) that depends on a set of parameters 𝜃 which are varied in order to minimize a given objective function (see Fig. 2). The quantum circuit is hence trained by a classical iterative optimization algorithm that at every step finds best candidates 𝜃 starting from random (or pre-trained) initial values.
Fig. 2

Schematisation of a variational quantum circuit

Schuld and Killoran recently explored this concepts (Schuld and Killoran 2019) remarking the strict relation between quantum states and feature maps. The authors explain that the key element in both quantum computing and kernel methods is to perform computations in a high dimensional (possibly infinite) Hilbert space via an efficient manipulation of inputs.

In fact it is possible to interpret the encoding of classical inputs xi into a quantum state |ϕ(x)〉 as a feature map ϕ which maps classical vectors to the Hilbert space associated with a system of qubits. As said before, two ways of exploiting this parallelism are described.

In the first approach, called by the authors implicit, a quantum device takes classical input and evaluates a kernel function as part of a hybrid classification model. This requires the use of a quantum circuit Uϕ(x) implementing the mapping
$$ {\phi}: \mathbf{x} \to |\phi(\mathbf{x})\rangle= U_{\phi}(x)| 000 ..0 \rangle $$
and which is able to produce a kernel
$$ K(\mathbf{x}_{i} , \mathbf{x}_{j} )= \left\langle 000 ..0 \right| U_{\phi}^{\dagger}(x_{i}) U_{\phi}(x_{j}) \left| 000 ..0 \right\rangle $$

In order for quantum computing to be helpful, such kernel shouldn’t be efficiently simulated by a classical computer. It is therefore posed the question of what type of feature map circuits Uϕ leads to powerful kernels for classical learning models like SVM but at the same time are classically intractable. The authors suggest that a way to achieve such a goal is to employ non-Gaussian elements (e.g., cubic phase gate or photon number measurements) as part of the quantum circuit Uϕ(x) implementing the mapping to the feature space.

The second approach, addressed in the paper as explicit, uses a variational quantum circuit to directly learn a decision boundary in the quantum Hilbert space. In their example, the authors first translate classical input to a quantum squeezed state
$$ \mathbf{x} \!\to\! | \phi(\mathbf{x})\rangle = \frac{1}{\sqrt{\cosh(c)}} \sum\limits_{n=0}^{\infty} \frac{\sqrt{(2n)!}}{ 2^{n} n!} (-\exp^{i\mathbf{x}} \tanh (c))^{n}| 2n\rangle , $$
then apply to |ϕ(x)〉 the parametrized continuous-variable circuit: where W(𝜃) is a repetition of the following gates: The components of such gates are, more explicitly,
$$ BS(\theta_1,\theta_2) = e^{\theta_1} (e^{i\theta_2} \hat{a}_1^{\dagger}\hat{a}_2 -e^{-i\theta_2} \hat{a}_1\hat{a}^{\dagger}_2), $$
with \(\theta _1,\theta _2 \in \mathbb {R}\) and \(\hat {a}, \hat {a}^{\dagger }\) creation and annihilation operators;
$$ D(z) = e^{\sqrt{2}i(\text{Im}(z) \hat{x} - \text{Re}(z)\hat{p})}, $$
with complex displacement z and finally the quadratic and cubic phase gates
$$ P(u) = e^{i\frac{u}{2} \hat{x}^2} \text{ and } V(u) = e^{i\frac{u}{3} \hat{x}^3}. $$
The probability of measuring the Fock state |n1, n2〉 in the state |2,0〉 or |0,2〉 is interpreted as the probability that the classifier predicts class y = 0 or y = 1
$$ p(|{2, 0}\rangle)= p(y=0) \ \ \ \text{and }\ \ \ p(|{0,2}\rangle)= p(y=1) $$
The authors trained such a model on the `moons’ dataset using stochastic gradient descent and showed that training loss s converges to zero after about 200 iterations.

Along the same path, simultaneously to Schuld and Killoran (2019), Havlicek et al. (2019) propose two classifiers that map classical data into quantum feature Hilbert space in order to get a quantum advantage. Again, one SVM classifier is based on a variational circuit that generates a separating hyperplane in the quantum feature space ,while the other classifier only estimates the kernel function on the quantum computer.

The two methods are tested on an artificial dataset xTSΩ⊂ (0,2π]2 where T and S are respectively the training and test sets. This classical input is previously encoded as \(\phi _{S}(\mathbf {x}) \in \mathbb {R}\) where ϕS(x) = (πx1)(πx2).

On the basis that, in order to obtain an advantage over classical approaches, feature maps need to be based on a circuit that is hard to simulate with classical means, the authors propose a feature map on n-qubits generated by the unitary
$$ \mathcal{U}_{\Phi}(\mathbf{x}) = U_{\Phi(\mathbf{x})} H^{\otimes n} U_{\Phi(\mathbf{x})} H^{\otimes n} $$
where H is the Hadamard gate and
$$ U_{\Phi(\mathbf{x})} = \exp\left( i \sum\limits_{S \subseteq [n]} \phi_{S}(\mathbf{x}) \prod\limits_{k \in S} Z_k\right), $$
with Zk being the phase shift gate of angle k and S the test set. Such circuit acts on \(\left | {0}\right \rangle ^n\) as initial state and uses classical data previously encoded in ϕS(x).

The exact classical evaluation of the inner-product (i.e., kernel) between two states obtained using a circuit UΦ(x) is #P - hard because it is associate to a Tutte partition function which is hard to simulate classically (Goldberg and Guo 2017).

A different approach is taken in Di Pierro et al. (2017), where the same idea of using quantum computation to evaluate a kernel is discussed in the context of Topological Quantum Computation (TQC).

TQC represent a model of quantum computing polynomially equivalent to the circuit based where, instead of using qubits and gates, the computation is performed braiding two-dimensional quasi particles called anyons (Pachos 2012). Moreover, it is well known that some computational problems, such as the approximation of the Jones Polynomial, i.e., an invariant of links and knots, have a more straightforward implementation in TQC (Aharonov et al. 2006).

The approach proposed in Di Pierro et al. (2017) is based on an encoding of input classical data x in the form of binary strings into braids, which in TQC are expressed by means of evolution operators B. This encoding is constructed by mapping the bit value 0 to the crossing operator σi, and the bit value 1 to the adjoint crossing operator \(\sigma _{i^{\dagger }}\):
Hence, a given binary string of length n is uniquely represented by a pairwise braiding of 2n strands, i.e., by a braid BB2n as shown below.
Therefore, applying the braiding Bu associated with the binary string u to the vacuum state of the anyonic quantum system \( \left | \psi \right \rangle \) defines an embedding ϕ into the Hilbert space \( {\mathscr{H}} \) of the anyonic configurations:
$$ \phi: u \to \textbf{B}_u\left| \psi \right\rangle $$
The authors finally show that scalar product of anyonic quantum states obtained with such mapping generates a kernel that depends on the hamming distance between the input strings as follows
$$ \begin{array}{@{}rcl@{}} K(u,v) &\equiv& \left\langle \psi \right| \textbf{B}_u^{\dagger}\textbf{B}_v \left| \psi \right\rangle =\left( \frac{\left\langle {\textbf{Hopf}} \right\rangle}{d}\right)^{d_H(u,v)}\\ &=&\left( \frac{A^4+A^{-4}}{A^2+A^{-2}}\right)^{d_H(u,v)} \end{array} $$
where \(\left \langle {\textbf {Hopf}}\right \rangle \) indicates the Kaufman polynomial (Kauffman 1987) in the variable A that is associated to the so called Hopf link, d = A2 + A− 2 and dH(u, v) is the hamming distance between input strings u and v.

Despite this example does not provide a computationally hard kernel, the authors suggest that a more complex braid mapping of the input may lead naturally to a classically intractable kernel since the calculation of Kaufman polynomial belongs to the #P - hard class (Goldberg and Guo 2017).

5 Conclusion

In this paper, we have reviewed the main approaches to the design of algorithms for kernel methods in ML, which exploit the power of quantum computing to achieve a computational advantage with respect to the classical approaches. We divided the literature on this problem into two main categories. On the one side, there are attempts to formulate quantum versions of support vector machine running on a gate model quantum computer. On the other side, we grouped different approaches whose core idea relies on the use of quantum computing techniques in order to deal with classically intractable kernels. In Table 1, we give a schematic description of the various results that we have discussed together with the relative article where they appear.
Table 1

Rundown of the main results and references

Category

Method

Title

Quantum version of SVM

Grover algorithm

Quantum optimization for training support vector machines (Anguita et al. 2003)

Quantum version of SVM

HHL algorithm

Quantum support vector machine for big data classification (Rebentrost et al. 2014)

Experimental

NMR 4-qubit quantum processor

Experimental implementation of a quantum support vector machine (Li et al. 2015)

Experimental

IBM quantum experience

Quantum algorithm Implementations for beginners (Patrick et al. 2018)

Quantum version of SVM and ECOC

HHL algorithm

Quantum error-correcting output codes (Windridge et al. 2018) ]

Kernel methods

Variational quantum circuit

Quantum machine learning in feature Hilbert spaces (Schuld and Killoran 2019)

Kernel methods

Variational quantum circuit

Supervised learning with quantum-enhanced feature spaces (Havlicek et al. 2019)

Kernel methods

Topological quantum computation

Hamming distance kernelisation via topological quantum computation (Di Pierro et al. 2017)

Notes

References

  1. Agresti I, et al. (2019) Pattern recognition techniques for boson sampling validation. Phys Rev X 9:14Google Scholar
  2. Aharonov D, Jones V, Landau Z (2006) A polynomial quantum algorithm for approximating the Jones polynomial. In: Proceedings of the 38th annual ACM symposium on theory of computing, pp 427–436Google Scholar
  3. Aïmeur, et al. (2013) Quantum speed-up for unsupervised learning. Mach Learn 90:261–287MathSciNetzbMATHCrossRefGoogle Scholar
  4. Amin MH, et al. (2018) Quantum Boltzmann machine. Phys Rev X 8:11Google Scholar
  5. Anguita D, et al. (2003) Quantum optimization for training support vector machines. Neural Netw 16:763–770CrossRefGoogle Scholar
  6. Arunachalam S, Wolf Ronald de (2017) A survey of quantum learning theory, arXiv:1701.06806
  7. Barry J, et al. (2014) Quantum partially observable Markov decision processes. Phys Rev A 90:032311CrossRefGoogle Scholar
  8. Benedetti M, et al. (2019) Adversarial quantum circuit learning for pure state approximation. New J Phys 21:043023MathSciNetCrossRefGoogle Scholar
  9. Biamonte J, et al. (2017) Quantum machine learning. Nature 549:195–202CrossRefGoogle Scholar
  10. Bishop C (2016) Pattern recognition and machine learning, vol 738. Springer, New YorkGoogle Scholar
  11. Bottarelli L, et al. (2018) Biclustering with a quantum annealer. Soft Comput 22:6247–6260zbMATHCrossRefGoogle Scholar
  12. Buhrman H, Cleve R, Watrous J, De Wolf R (2001) Quantum fingerprinting. Phys Rev Lett 87:4CrossRefGoogle Scholar
  13. Canabarro A, Fernandes Fanchini F, Malvezzi AL, Pereira R, Chaves R (2019) Unveiling phase transitions with machine learning. arXiv:1904.01486
  14. Ciliberto C, et al. (2018) Quantum machine learning: a classical perspective. Proc R Soc A: Math Phys Eng Sci 474:20170551MathSciNetzbMATHCrossRefGoogle Scholar
  15. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297zbMATHGoogle Scholar
  16. Crawford D, et al. (2016) Reinforcement learning using quantum Boltzmann machines, arXiv:1612.05695
  17. Di Pierro A, et al. (2017) Distance kernelisation via topological quantum computation theory and practice of natural computing. Lect Notes Comput Sci 10687:269–280CrossRefGoogle Scholar
  18. Di Pierro A, et al. (2018) Homological analysis of multi-qubit entanglement. Europhys Lett 123:30006CrossRefGoogle Scholar
  19. Dong XY, Pollmann F, Zhang XF (2019) Machine learning of quantum phase transitions. Phys Rev B 99:121104CrossRefGoogle Scholar
  20. Dunjko V, Briegel HJ (2018) Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Rep Prog Phys 81:074001MathSciNetCrossRefGoogle Scholar
  21. Dunjko V, et al. (2016) Quantum-enhanced machine learning. Phys Rev Lett 117:6MathSciNetCrossRefGoogle Scholar
  22. Giovannetti V, Lloyd S, Maccone L (2008) Quantum random access memory. Phys Rev Lett 100:4MathSciNetzbMATHGoogle Scholar
  23. Goldberg LA, Guo H (2017) The complexity of approximating complex-valued ising and tutte partition functions. Computational Complexity 26:765–833MathSciNetzbMATHCrossRefGoogle Scholar
  24. Gray J, et al. (2018) Machine-learning-assisted many-body entanglement measurement. Phys Rev Lett 121:6CrossRefGoogle Scholar
  25. Harrow AW, Hassidim A, Lloyd S (2009) Quantum algorithm for linear systems of equations. Phys Rev Lett 103:4MathSciNetCrossRefGoogle Scholar
  26. Havlicek V, Córcoles AD, et al. (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567:2019–212CrossRefGoogle Scholar
  27. Heim B, et al. (2015) Quantum versus classical annealing of ising spin glasses. Science 348:215–217MathSciNetzbMATHCrossRefGoogle Scholar
  28. Huembeli P, et al. (2019) Automated discovery of characteristic features of phase transitions in many-body localization. Phys Rev B 99:6CrossRefGoogle Scholar
  29. Iten R, et al. (2018) Discovering physical concepts with neural networks, arXiv:1807.10300
  30. Kauffman LH (1987) State models and the Jones polynomial. Topology 26:395–407MathSciNetzbMATHCrossRefGoogle Scholar
  31. Levine Y, et al. (2018) Deep learning and quantum entanglement: fundamental connections with implications to network design. In: International conference on learning representationsGoogle Scholar
  32. Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum support vector machine. Phys Rev Lett 114:5Google Scholar
  33. Lloyd S, Mohseni M, Rebentrost P (2014) Quantum principal component analysis. Nat Phys 10:631–633CrossRefGoogle Scholar
  34. Lu S, Braunstein SL (2014) Quantum decision tree classifier. Quantum Inf Process 13:757–770MathSciNetzbMATHCrossRefGoogle Scholar
  35. Mcclean JR, Romero J, Babbush R, Aspuru-Guzik A (2016) The theory of variational hybrid quantum-classical algorithms. New J Phys 18:023023CrossRefGoogle Scholar
  36. Mercer J, et al. (1909) Functions of positive and negative type and their connection the theory of integral equations, 209 Philosophical Transactions of the Royal Society of LondonGoogle Scholar
  37. Mikhail V, et al. (2016) Altaisky towards a feasible implementation of quantum neural networks using quantum dots. Appl Phys Lett 108:103108CrossRefGoogle Scholar
  38. Mitchell T (1997) Machine learning. McGraw Hill, New YorkzbMATHGoogle Scholar
  39. Mohri M, et al. (2012) Foundations of machine learning, vol 432. MIT Press, CambridgeGoogle Scholar
  40. Nielsen MA, Chuang IL (2011) Quantum computation and quantum information. Cambridge University Press, New YorkzbMATHGoogle Scholar
  41. O’Driscoll L, et al. (2019) A hybrid machine learning algorithm for designing quantum experiments. Quantum Mach Intell 1:1–11CrossRefGoogle Scholar
  42. Pachos JK (2012) Introduction to topological quantum computation. Cambridge University Press, New YorkzbMATHCrossRefGoogle Scholar
  43. Patrick J, et al. (2018) Coles quantum algorithm implementations for beginners, arXiv:1804.03719
  44. Perdomo-Ortiz A, et al. (2018) Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers. Quantum Sci Technol 3:030502CrossRefGoogle Scholar
  45. Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113:5CrossRefGoogle Scholar
  46. Schuld M, Killoran N (2019) Quantum machine learning in feature Hilbert spaces. Phys Rev Lett 122:6CrossRefGoogle Scholar
  47. Schuld M, Petruccione F (2018) Supervised learning with quantum computers, vol 287. Springer International Publishing, BerlinzbMATHCrossRefGoogle Scholar
  48. Schuld M, Sinayskiy I, Petruccione F (2015) An introduction to quantum machine learning. Contemp Phys 56(2):172–185zbMATHCrossRefGoogle Scholar
  49. Sergioli G, et al. (2018) A quantum-inspired version of the nearest mean classifier. Soft Comput 22:691–705zbMATHCrossRefGoogle Scholar
  50. Stoudenmire E, Schwab DJ (2016) Supervised learning with tensor networks. Advances in neural information processing systems (NIPS Proceedings) 29:4799–4807Google Scholar
  51. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300CrossRefGoogle Scholar
  52. Theodoridis S (2008) Pattern recognition, vol 984. Elsevier Academic Press, CambridgeGoogle Scholar
  53. Wiebe N, et al. (2015) Quantum algorithms for nearest-neighbours methods for supervised and unsupervised learning. Quantum Info Comput 15:316–356MathSciNetGoogle Scholar
  54. Windridge D, Mengoni R, Nagarajan R (2018) Quantum error-correcting output codes. Int J Quantum Info 16:1840003zbMATHCrossRefGoogle Scholar
  55. Wittek P (2014) Quantum machine learning, vol 176. Elsevier Academic Press, CambridgezbMATHGoogle Scholar
  56. Yu S, Albarrán-Arriagada F, Retamal JC, Wang YT, Liu W, Ke ZJ, Meng Y, Li ZP, Tang JS, Solano E, Lamata L, Li CF, Guo GC (2019) . Adv Quantum Technol 2(7-8):1800074CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of InformaticsUniversity of VeronaVeronaItaly

Personalised recommendations