In order to run the quantum machine learning algorithms presented in this book we often assumed to have a universal, large-scale, error-corrected quantum computer available. Universal means that the computer can implement any unitary operation for the quantum system it is based on, and therefore any quantum algorithm we can think of. Large-scale refers to the fact that we have a reasonably high number of qubits (or alternative elementary quantum systems) at hand. Error-corrected means that the outcomes of the algorithm are exactly described by the theoretical equations of quantum theory, in other words, the computer does not make any errors in the computation besides those stemming from numerical instability.

But as we discussed in the introduction, quantum computing is an emerging technology, and the first generation of quantum computers, which have been called noisy intermediate-term devices [1], does not fulfil these conditions. Firstly, intermediate-term devices are not necessarily universal. Sometimes they do not aim at universality, for example in the case of quantum annealers that are specialised to solve a specific problem. But even quantum computers that in principle are designed as universal devices may not have fully connected architectures, or only implement a subset of gates reliably. Secondly, we usually have a rather small number of qubits available, which is why intermediate-term devices are also called small-scale devices. Thirdly, early-generation quantum computers are noisy. Not only do the gates have a limited fidelity or precision, they also do not have mechanisms to detect and correct errors that decohere the qubits and disturb the calculation. Hence, we can only apply a small number of gates before the result of the computation is too noisy to be useful.

While for non-universality, every technology has different challenges to which solutions will be increasingly available, the small scale and noisiness of most early technologies impose general limitations on the ‘size’ of the quantum algorithms they can implement: We only have a small number of qubits (i.e., a low circuit width) and have to keep the number of gates (the circuit depth) likewise small enough to contain error propagation. As a rule of thumb, the quantum community currently speaks of intermediate-term algorithms if we use of the order of 100 qubits and of the order of 1, 000 gates. An important question is therefore what approaches to quantum machine learning are actually feasible in noisy intermediate-term devices. In other words, are there quantum machine learning algorithms for real-life problems that use only about 100 qubits, have a circuit depth of about 1, 000 gates and are robust to some reasonable levels of noise?

This last question is the subject of active research, and given its difficulty it might remain so for some time. In this final chapter we want to discuss some aspects of the question in the context of what has been presented in this book, both as a summary and as an outlook.

9.1 Small Versus Big Data

We have attributed a lot of space in this book to questions of data encoding, or how to feed the full information we have for a certain problem, most notably the dataset, into the quantum device. In many situations, it is the bottleneck of an algorithm. This was driven to the extreme in our introductory ‘Titanic’ classification example in Chap. 1, where besides data encoding, the algorithm only took one Hadamard gate and two measurements to produce a prediction. It is not surprising that in general, data encoding costs linear time in the data size, because every feature has to be addressed. Even fancy technologies like quantum Random Access Memories would still require the data to be written into the memory. The linear costs pose a dilemma for intermediate-term applications: A dataset of only 100 inputs, each of dimension 10, would already “use up” of the order of 1, 000 gates. In addition to that, data representation in basis encoding has severe spatial limitations for the dimension of the data; if every feature is encoded in a \(\tau \)-bit binary sequence, we can only encode \(100/\tau \) features into 100 qubits.

The bottleneck of data encoding means that, besides some few special cases, quantum machine learning will most likely not offer intermediate-term solutions for big data processing. Of course, some algorithms—most notably hybrid training schemes—only process a subset of samples from the dataset at a time, thereby lifting the restrictions on the number M of data samples. For example, in Sect. 7.3.3.2 we looked at hybrid gradient descent training where the quantum device was used to estimate numerical or analytical gradients. Combining this with single-batch stochastic gradient descent, the quantum device only has to process one data point at a time, and the quantum subroutine is independent of the number of samples in the training set. Another example are hybrid kernel methods in which the quantum computer computes the ‘quantum kernel’ as a measure of distance between two data points (see Sect. 6.2.3). However, if the dimension N of the data samples is large, we still have to face the problem of extensive time and spatial resources needed to feed the samples into a quantum computer.

In order to deal with high-dimensional data we have two basic options. First, we can introduce structure that allows faster state preparation at the cost of accuracy. We call this largely unexplored strategy approximate data encoding. We have indirectly touched upon this topic in the context of probabilistic models in Sect. 6.3. The task of preparing a qsample, which in principle is nothing else than arbitrary state preparation, was addressed by preparing a mean field approximation of the distribution instead, which was linear—rather than exponential—in the number of qubits. Another topic in which approximate data encoding played a role was in the quantum basic linear algebra routines in combination with density matrix exponentiation, where once the data was given as a density matrix, we could read out eigenvalues of its low-rank approximation qubit-efficiently (Sect. 5.4.3). Whether the ‘sweet spot’ in the trade-off between state preparation time/circuit length and accuracy of the desired state is useful for pattern recognition, and whether it suits the requirements of intermediate-term devices are both open questions.

Second, we have touched upon the suggestion of considering quantum-assisted model architectures, in which only a small part of the machine learning model is computed by the quantum device. It was suggested in the literature that the deep and compact layers in a deep learning model could play this role. Again, we introduce structure to reduce complexity. While approximate data encoding would try to be as faithful as possible to the original dataset, the idea of deep quantum layers is to use a classical part of the model to select a few powerful features that are fed into the quantum device. Hopes are that the quantum device can use the reduced features more easily, or explore a different class of feature reduction strategies. Again, a lot more research is required to determine the potential of quantum-assisted model architectures, and an important question is how to incorporate them into training algorithms such as gradient descent.

There is one setting in which high-dimensional (even exponentially large) inputs could be classified by quantum machine learning algorithms, and this is the idea of using ‘quantum data’. This has been discussed in Sect. 1.1.3 of the introduction, but has not been the topic of later chapters because of a lack of research results at the time of writing. The idea was to use quantum machine learning algorithms to classify quantum states that are produced by a quantum simulator. Although machine learning for problems in quantum physics has shown fruitful applications, the combination of solving a quantum problem by a quantum machine learning algorithm is still a mostly untouched research agenda.

Although the promise of quantum methods for big data sounds appealing and the difficulties sobering, one should possibly not be too concerned. Besides the worldwide excitement about big data, problems where data collection is expensive or where data is naturally limited are plentiful, and there are many settings where predictors have to be built for extremely small datasets. An example is data generated by biological experiments, or specific data from the domain of reasoning. If quantum computing can show a qualitative advantage, for example in terms of what we discussed as ‘model complexity’ in Sect. 4.3, there will indeed be worthwhile applications in the area of small data.

9.2 Hybrid Versus Fully Coherent Approaches

In the course of this book we have seen quantum algorithms that range from the fully coherent training and classification strategies based on quantum basic linear algebra (blas) routines, to hybrid schemes where only a small part of the model is computed by a quantum device. Unsurprisingly, the latter is much more suitable for intermediate-term technologies. Hybrid quantum algorithms have the advantage of using quantum devices only for relatively short routines intermitted by classical computations. Another great advantage of hybrid techniques is that the parameters are available as classical information, which means they can be easily stored and used to predict multiple inputs. In contrast, many quantum blas-based algorithms produce a quantum state that encodes the trained parameters, and classification consumes this quantum state so that training has to be fully repeated for every prediction. Even if ‘quantum memories’ to store the quantum parameter state were developed, the no-cloning theorem in quantum physics prohibits the replication of the state.

Amongst hybrid algorithms, variational circuits are particularly promising (see Sects. 7.3 and 8.2). Here, an ansatz of a parametrised circuit is chosen and the parameters are fitted to optimise a certain objective. For example, one can define an input-output relation with regards to the quantum circuit and train it to generalise the input-output relations from the training data. While the idea is rather simple, the implementation opens a Pandora’s box of questions, some of which we have previously mentioned. What is a good ansatz for such a circuit for a given problem? We would like the ansatz to be slim in the circuit depth, width and the number of parameters used, but also as expressive as possible. This is nothing other than one of the fundamental questions of machine learning, namely to find simple but powerful models. Every hyperparameter allows more flexibility, but also complicates model selection for practical applications. Training is another issue. How can we train a model that is not given as a mathematical equation, but as a physical quantum algorithm? Can we do better than numerical optimisation, for example by using techniques such as the classical linear combination of unitaries presented in Sect. 7.3.3.2? The parametrisation of the quantum circuit can play a significant role in the difficulty to train a model, as it defines the landscape of the objective function. What are good parametrisation strategies? Do we have to extend the tricks of classical iterative training, such as momentum and adaptive learning rates, by those fitted to quantum machine learning? In short, variational algorithms for machine learning open a rich research area with the potential of developing an entirely new subfield of classical machine learning. In the larger picture, quantum machine learning could even pioneer a more general class of approaches in which ‘black-box’ analogue physical devices are used as machine learning models and trained by classical computers.

9.3 Qualitative Versus Quantitative Advantages

Quantum computing is a discipline with strong roots in theory. While the mathematical foundations were basically laid in the 1930 s, we are still exploring the wealth of their practical applications today. This is also true for quantum computing, where algorithmic design had to revert to theoretical proofs to advertise the power of quantum algorithms, since without hardware numerical arguments were often out of reach. Quantum machine learning seems to follow in these footsteps, and a large share of the early literature tries to find speedups that quantum computing could contribute to machine learning (see for example the Section on quantum blas for optimisation 7.1). In other words, the role of quantum computing is to offer a quantitative advantage defined in terms of asymptotic computational complexity as discussed in Sect. 4.1. The methods of choice are likewise purely theoretical, involving proofs of upper and lower bounds for runtime guarantees, and the ‘holy grail’ is to show exponential speedups for quantum machine learning. Such speedups are typically proven by imposing very specific constraints on the data, and little is known on applications that fulfil these constraints, or whether the resulting algorithm is useful in practice. Needless to say, judging from the journey quantum computing has taken so far, it is highly unlikely that it provides solutions to NP-complete problems and can solve all issues in machine learning.

On the other hand, machine learning methods tend to be based on problems that are in general intractable, a fact that does not stop the same methods from being very successful for specific cases. Consider for example non-convex optimisation in high-dimensional spaces, which is generally intractable, but still solved to satisfaction by the machine learning algorithms in use. Methods employed in machine learning are largely of practical nature, and breakthroughs often the result of huge numerical experiments.Footnote 1 Computational complexity is one figure of merit among others such as the generalisation power of a model, its ease of use, mathematical and algorithmic simplicity and wide applicability, whether it allows the user to interpret the results, and if its power and limits are theoretically understood.

Quantum machine learning may therefore have to rethink its roots in quantum computing and develop into a truly interdisciplinary field. We have motivated this in our distinction between explorative versus translational approaches in Sect. 1.1.4. Especially in the first generation of papers on quantum machine learning, it was popular to choose the prototype of a classical machine learning model and try to reproduce it with a quantum algorithm that promises some asymptotic speedup, in other words to translate the model into the language of quantum computing. The explorative approach that we highlighted in Chap. 8 is interested in creating new models, new dynamics and new training strategies that extend the canon of machine learning. Instead of theoretical analysis, these contributions benchmark their models against standard algorithms in numeric experiments. Their paths may diverge from what is popular in classical machine learning at the moment. For example, it was demonstrated in Sects. 6.2 and 6.3 where it was argued that the ideas of kernel methods and probabilistic models are much closer to quantum theory than the principle of huge feed-forward neural networks. The overall goal of the explorative approach is to identify a new quality that quantum theory can contribute to pattern recognition.

Of course, whether to focus on qualitative versus quantitative advantages is not necessarily an ‘either-or’ question. Constraints stemming from computational complexity are a major factor in shaping successful algorithms, and speedups, even quadratic ones, can prove hugely useful. It may not be possible to simulate a quantum model classically, which is in itself an exponential speedup. And the theoretical rigour of quantum researchers can prove a useful resource to develop the theory side of machine learning. But with intermediate-term quantum technologies paving the way to turn quantum computing into a numerical playground, quantum machine learning has good reasons to be at the forefront of these efforts.

9.4 What Machine Learning Can Do for Quantum Computing

This book tried to show ways of how quantum computing can help with machine learning, in particular with supervised learning. We want to conclude by turning the question around and ask what machine learning has to offer for quantum computing. One answer is trivial. Many researchers hope that machine learning can contribute a ‘killer app’ that makes quantum computing commercially viable. By connecting the emerging technology of quantum computing with a multi-billion dollar market, investments are much more prone to flow and help building large-scale quantum computers.

But another answer has been touched upon in the previous three sections of this conclusion. Quantum computing on the one hand is predominantly focused on theoretical quantum speedups, rather than hybrid algorithms that offer a new quality and which draw their motivation from successful numerical implementations rather than proofs. Machine learning on the other hand has to deal with uncertainty, noise, hard optimisation tasks and the ill-posed mathematical problem of generalisation. Maybe machine learning can inspire quantum computing in the era of intermediate-scale devices to add a number of methods to the toolbox, methods that are less rigorous and more practical, less quantitative and more qualitative.

Finally, machine learning is not only a field in computer science, but also based on philosophical questions of what it means to learn. Quantum machine learning carries the concept of learning into quantum information processing, and opens up a lot of questions on an abstract level—questions that aim at increasing our knowledge rather than finding commercial applications.