# Quantum Computer Software

Understanding the software stack that has been built around the D-Wave OneTM quantum computing hardware

Contents

• 1.1 - Introduction
• 1.2 - Software stack components
• 2.1 - Coding at the BlackBox Compiler level
• 2.2 - Key concept: The generating function
• 2.3 - Crafting a generating function matched to your application
• 2.4 - Adding the D-Wave OneTM System
• 2.5 - Coding at the frameworks level
• 2.6 - Summary and key take-aways
• APPENDIX - BlackBox details
• Aim, audience and required background

This material was developed to help those interested in programming the D-Wave OneTM System at a high level. This document introduces the basic ideas behind the software stack through which developers can program quantum computers. In order to understand the material here you may wish to read the Quantum Computing Primer document first.

What you will learn

By following through this background reading material, you will learn:

• How te software stack built around quantum hardware is constructed
• Examples of programming at different levels of the software stack
• How to use the D-Wave OneTM System as a co-processor to a conventional computer in a scalable way
• How quantum computer software is changing and improving over time

• SECTION 1

1.1 - Introducing the software stack

The D-Wave One is the world's first commercially available quantum computer system. The system is designed to tackle discrete optimization problems - a hard problem type which bottlenecks the development of many applications. If you can imagine a use of computers that is currently beyond our capabilities, it is likely that the hardness of discrete optimization problems is the main culprit.

While quantum computers are usually thought of as primarily hardware, this is mostly because of the immaturity of the field. Prior to the D-Wave One, there were no operational quantum computers, and therefore there was no reason to think too hard about software. The D-Wave One changed this, and opened up a whole new set of challenges and opportunities for software systems designed for quantum computers.

Any useful computing system must provide robust and general software access to the hardware's capabilities. The processors at the heart of the D-Wave One system are unlike any ever built, and therefore the software stack for connecting a programmer to the hardware had to be developed from scratch.

This document will describe our approach to developing the software architecture for the D-Wave One. The focus will be on the way that a user of the system can interact with the various levels of abstraction we've built.

The software stack which connects the hardware to a user is shown in Figure 1 on the following page. Each layer within the stack will be explained in turn.

Figure 1. The software architecture for the D-Wave One. The dotted circles are elements which are planned for inclusion in future releases of the D-Wave developer packs.

1.2 - Software Stack components

Layer 1: Hardware

The D-Wave One system contains a hardware component which is a large physical machine. The computer hardware (microprocessor and surrounding infrastructure) is housed in a 10'x10' shielded room. The current product system is sold complete with cooling and shielding equipment, and is fully calibrated and factory tested. The system has a turnkey operation and is ready to solve problems as soon as it is installed at a user's site. For more information on this layer, please see the dedicated QC hardware tutorial.

Layer 2: SAPI interface (machine code layer)

This layer represents the low-level settings of qubit parameters; the 'machine code' of the quantum computer. The code is sent to the machine using a web interface; a local webserver parses the incoming requests and queues and sorts code automatically, so multiple users can be handled at the same time.

At this software level, the key difference between the D-Wave quantum computing system and a conventional computer is that the cloud-based interfacing is implemented within this layer; in a conventional machine it is done at a much higher layer. A conventional computing analogy to the way that D-Wave's stack is architected is that of a microprocessor being fed machine code over the internet. The fact that D-Wave's hardware incorporates a cloud-based methodology (regardless of the fact that it is done at a low level) means that the system is already architected to support being added to a high performance computing center or server farm.

Learning to program the machine at this level is very difficult, but it is upon this layer that algorithmic and software development is based. The end user would never normally program at this level unless a.) They wished to write a new kind of compiler which used the underlying quantum mechanics in a new way, in order to develop a language/programming framework that is not currently supported by the D-Wave software stack, or b.) The end user wishes to explore the system as a research tool; control at the machine code level can be used to explore fundamental physics or computer science experiments.

Layer 3: Compiler layer

This layer allows full access to the optimization power of the hardware without the user needing to know anything about the underlying hardware architecture or the physics of the machine. A programmer using this level of abstraction only needs to provide a function to be optimized, where the function returns a real number given a bit string. This function can literally be anything. We call this the compiler layer because abstracts away the intricate details of the computational hardware from the programmer, (which allows thinking in more general terms about problem solving), although it is not exactly analogous to a conventional classical compiler.

Layer 4: Client library wrappers

For those wishing to develop applications, this layer of software abstraction is usually the most natural starting point. This layer provides the ability to use standard high level programming languages to access the underlying parts of the software system. In many of the applications developed to date for the D-Wave One, conventional high performance computing systems are required to compute the values of the function requiring optimization. This layer supports a natural synergy between conventional supercomputers and the D-Wave One, where the workload for 'big data' problems can naturally be subdivided between a conventional supercomputing system and the D-Wave One.

Layer 5: Frameworks

The modes of use layer allows the creation of a much higher level code base consisting of code which uses optimization at its core but may have complicated algorithmic steps wrapped around this central hard problem. Modes of use are implementations of algorithms (such as those found in the area of machine learning) that can be applied to many different applications areas. The layer allows the construction of complex methods for attacking problems. Examples of modes of use in the D-Wave One software stack include supervised binary classification, supervised multiple label assignment, and unsupervised feature learning, which have been neatly bundled into libraries of commonly used functions to form programming frameworks that use quantum mechanics under the hood.

Layer 6: Applications layer

This is the level where an end-user (customer, client, etc.) would interact with quantum applications, which have been developed using the lower layers.

Examples of applications that have been coded using the D-Wave OneTM quantum computer include: Binary classification for object detection in images and polarity labeling of movie review text; correlating text sentiment extracted from news feeds with stock market prices; video compression; lattice protein folding; and assignment of category labels to images, blog posts and news stories.

SECTION 2

2.1 - Coding at the BlackBox Compiler level

As with conventional processors and computing systems, it is possible to develop on a D-Wave OneTM System by programming directly in the machine language of the processor. D-Wave's developer tools, up to and including devpack 1.3, provided this level as the only option available.

In devpack 1.4, we have released a new, higher level programming interface to the D-Wave OneTM System which is called the D-Wave BlackBox compiler - BlackBox for short. Under the hood, the D-Wave OneTM System is an extremely exotic and complex machine. Programming at the machine language level is extremely difficult, even for our own internal developers. Abstracting away from this underlying complexity is critical for making the system broadly accessible.

The BlackBox compiler framework simplifies the ease of use of the D-Wave OneTM System, and dramatically extends its possible range of uses. Programming using BlackBox is so easy, users can be up and running problems on it within minutes. This is the level at which nearly all programmers interact with the D-Wave OneTM system. Here is how it works.

2.2 - Key concept: the generating function

First, a developer codes a function which takes as input a string of bits - zeroes and ones - and outputs a real number. We'll call the function provided by the developer the generating function, or G(x). Many of the applications-level programming examples in the Tutorials section involves taking a real world problem and crafting it into a generating function. Mathematically we write the generating function like this:

$G(x_1,x_2, ... ,x_N )$

where the $$x_k$$ variables (there are a total of N of these) are binary (0 or 1) variables. The value returned by the function $$G(x_1,x_2,...,x_N )$$ has to be a real number.

Let's take a look at a simple example:

$G(x_1,x_2 ) = x_1+2x_2$

In this case, we have N=2 variables, and the function $$G(x_1,x_2 )$$ evaluates to real numbers (well actually integers, but that's OK too!). Specifically, plugging in the 2N = 4 possible inputs, these are $$G(0,0)=0$$, $$G(1,0)=1$$, $$G(0,1)=2$$, and $$G(1,1)=3$$.

This particular example is really simple. However you as a developer can make G as gnarly and complicated as you want, as long as it takes as input binary variables and outputs real numbers. For example it could be something bizarre like:

$G(x_1,x_2)= 2^{(x_1/(1+x_2))}exp[sin(x_1)]x_1 x_2$

However it doesn't even need to arise from a closed form mathematical expression. For example, it could be related to the outcome of a series of settings of control switches on a complex system, such as a flight control system for an airplane.

This function can be coded up using any programming language (here we will use Python), and will be evaluated entirely using a conventional computing system. For some problems, the conventional computing system you'd want to use to do this is a high performance computing system, such as a cluster, a supercomputer or a cloud-based solution; however it could just as easily be your laptop - it's up to you, and depends on how hard it is to evaluate the value of the generating function given an input.

2.3 - Crafting a generating function matched to your application

Overview

A developer chooses $$G(x_1,x_2,...,x_N )$$ so that the lower the real number returned by the function, the 'better' the input bit string $$(x_1,x_2,...,x_N )$$ was. The input bit string represents a 'potential solution', and the number returned by the function given that input gives a measure of the goodness of the potential solution - the lower $$G(x_1,x_2,...,x_N )$$ is, the better the potential solution $$(x_1,x_2,...,x_N )$$ is.

Let's take a look at our first example, where $$G(x_1,x_2 )=x_1+2x_2$$. In this case if we look at the returned values for all possible input combinations $$G(0,0)=0$$, $$G(1,0)=1$$, $$G(0,1)=2$$, and $$G(1,1)=3$$, we see that the choice of $$G(x_1=0,x_2=0)=0$$ gives the lowest (and therefore 'best') value of G. The generating function is sometimes called an optimization objective function, although as a developer you can use the output of BlackBox for more than just optimization (more on this shortly).

So where does G come from in the first place? It comes from your needs as an applications developer. If you are building an application where the above concept is useful (or even critical) to your work, the D Wave One provides a new tool to attack that problem. This is a little abstract, so we'll now provide a couple of examples to help illustrate the process of 'crafting a generating function'.

Example 1: The Subset Sum Problem

For example, let's say that you wanted to solve the Subset Sum Problem (SSP). In this problem, you are given a set of integers, and you want to determine whether there is a non-empty subset whose sum is zero. For example, the set might be $${-7, -3, -2, 5, 8}$$ - this is the instance provided in the Wikipedia page on SSP. In this case the answer is 'yes', because the subset $$\{ -3, -2, 5\}$$ sums to zero. As an applications developer, you could code up the SSP in the following way (note that there are many ways you could try to do this). Define five variables $$(x_1,x_2,x_3,x_4,x_5 )$$ where if a variable is equal to 1 it means "please include me in the sum". In this case the generating function for the above example could be

$G(x_1,x_2,x_3,x_4,x_5 )=(-7x_1-3x_2-2x_3+5x_4+8x_5)^2+\prod_{(k=1)}^5(1-x_k)$

where the values of the coefficients are just the values in the original set. The first term fixes the minimum possible value of that part of G to be zero, and the second term penalizes the solution with all variables equal to zero (which otherwise would show up as an allowed solution!) but not any of the others. If we can find a configuration of variables for which $$G(x_1,x_2,x_3,x_4,x_5 )=0$$ we've got a solution to our SSP instance. This subset sum example is the focus of the hands-on Getting Started tutorial, so you can follow along with coding up this problem to help further understand the process.

Example 2: Lattice protein folding

Another example is using the compiler to write a protein folding application. The user would again need to craft a function which somehow encoding a 'fold' into a bitstring. For example, in a simple 2D lattice protein folding model, for each segment of the protein one could encode a fold to the left as 01, a fold to the right as 10 and no fold to 11. A long bit string could therefore contain a complicated series of folds. (Figure 2)

Figure 2. To build a lattice protein folding application, all that is required is a function that returns the energy of a suggested fold - the D-Wave One will then find the fold that gives the lowest energy. Here is an example of a third-party implementation of the two-dimensional HP lattice protein folding model built using the D-Wave software tools.

Given a particular fold, one must be able to compute the energy of this fold (how 'good' the fold is). The user would write a function that was able to take in a bit string corresponding to a fold, and return an energy value. Once this function has been written, it is passed as an argument into the BlackBox function in the SynDist layer. This layer will again repeatedly call the lower layers to send the relevant problem data to the hardware, and return the best set of folds that it finds, in terms of the 01,10,11 encoding scheme.

2.4 - Adding the D-Wave OneTM System

Up to this point everything has been done in a conventional framework, and all we've done is set up some machinery defining a function. Now we add the D-Wave OneTM System in the following way.

Imagine the entire computing system as consisting of two complimentary parts - think of these as the 'right brain' and 'left brain' of a hybrid computing system (see Figure 1 below). The 'left brain' is a conventional computing system whose rational, tedious job it is to compute the value of the generating function given 'guesses' at potential solutions. In general this involves a large amount of computation of the sort conventional computing systems excel at. The 'right brain' is the D-Wave One, which suggests 'creative' potential solutions, using the results obtained by the conventional computing system to quickly hone in on better and better solutions.

This split allows hybrid systems to be built that can deal with enormous amounts of data and extremely complex generating functions. The information that passes between the conventional computing system and the D-Wave OneTM System is extremely small - it's just the bit strings coming from the system representing its guesses at good answers flowing from the D-Wave OneTM System to the conventional system, and the real numbers characterizing how good those guesses were, flowing from the conventional system to the D-Wave OneTM System. The amount of data that might be required to compute the value of the generating function could be (and often is) enormous - but we can use all of the standard tactics for dealing with this using conventional systems architecture.

Figure 3. The BlackBox programming paradigm separates the evaluation of the generating function, and the process of generating potential solutions. The evaluation of the function happens in a conventional computing system, while the solution generation happens in the D-Wave One. Information flow between the two is extremely low bandwidth and is restricted to bit strings representing potential solutions flowing from the D-Wave OneTM System to the conventional system, and real numbers representing the values of the generating function evaluated for those guesses flowing from the conventional system to the D-Wave OneTM System.

Using this hybrid system is simple. The developer provides the generating function, initiates the computation, and then the system starts 'thinking' about the generating function in the following, iterative way:

• 1 - First, a series of random solutions are generated by the conventional computing system.
• 2 - The quality of these guesses is evaluated by passing them into the generating function.
• 3 - The real numbers characterizing the goodness of the solutions are sent to the D-Wave OneTM System.
• 4 - The D-Wave OneTM System automatically adjusts itself based on this feedback, and then generates a series of guesses, informed by the results received.
• 5 - These new guesses are sent back to the conventional system, where their goodness is evaluated.
• 6 - Steps 3-5 are repeated until exit criteria are met.

• When the D-Wave OneTM System generates a series of guesses at Step 4, it is doing something rather complicated. Here we won't go into any detail as to specifically what's going on in this step, but we'll outline the process. The following section is rather technical; you can skip it if you'd just like to start using BlackBox.

2.5 - Coding at the Frameworks level

As quantum computer programming is a relatively new art, a lot of the programming is currently done at the BlackBox (compiler) level. However, even though this is much easier than coding at the machine code level, this is still a long way from the high level applications based coding that is performed with classical digital computers today. To code applications, more layers of abstraction are needed.

For a programmer to use a D-Wave One at the frameworks level, they would import a library such as dwave_ssfl, which contains a bunch of functions which are useful for feature learning (a commonly performed operation in machine learning applications). These functions would already have built into them concepts such as the objective functions described in the BlackBox section above, and the user would only have to supply the data over which they wanted to perform the learning task, learn how to use the SSFL library, and set a few parameters. Such libraries are being developed currently, and will be available for use soon.

2.6 - Summary and key take-aways

The D-Wave One software stack has been architected to allow developers to build code at several different levels of abstraction from hardware. From directly coding at the machine code level, all the way to high level frameworks, there is a lot of flexibility available.

The key take-away is that the software stack ensures that the end user need not have any understanding of the underlying hardware or the physics of the computer to be able to use D-Wave's technology to build powerful applications.

There is no requirement for a developer to understand or care about the details of the functioning of the processor at the machine code level - sophisticated industrial applications can, and in most cases should, be constructed using D-Wave tools that abstract away the underlying details.

For most applications, the best place to start prototyping new ideas and developing code is at the level of the compiler (through the client libraries). As this is a key concept, most of the applications programming examples featured on the Tutorials page are dedicated to showcaseing many examples of how to do this.

APPENDIX - BlackBox details

Probability distributions

In order to give some intuition for what the D-Wave OneTM System is doing in Step 4, we have to take a brief detour into statistical physics.

In physics, often a physical system is described by writing down a function whose inputs can be thought of as specific physical states of the system, and whose outputs are the energies of those states.

If the physical system of interest (call it the "central system") is in a state called thermal equilibrium, the probability of being in any particular state has a known functional form called the Boltzmann distribution. This distribution has the form

$P(x_1,x_2,...,x_N )= \frac{1}{Z}exp[-G(x_1,x_2,...,x_N )/kT]$

which means, "if the system whose allowed energies are given by $$G(x_1,x_2,...,x_N )$$ is in thermal equilibrium at temperature T, the probability of the system being in state $$(x_1,x_2,...,x_N )$$ is $$P(x_1,x_2,...,x_N )$$". Here k is the Boltzmann constant (don't worry too much about its specific value, it won't matter for us here) and Z is something called the partition function, which is

$Z = \Sigma_{(k=1)^N} \Sigma_{(x_k=0,1)}exp[-G(x_1,x_2,...,x_N )/kT]$

Note that actually calculating $$P(x_1,x_2,...,x_N )$$ given $$G(x_1,x_2,...,x_N )$$ is not easy at all. But let's not worry about that right now. Let's just imagine that we had a physical system where the probabilities of measuring the bits $$(x_1,x_2,...,x_N )$$ were somehow magically given by the Boltzmann distribution. What would this look like, and what could we do with it?

The algorithms underlying the BlackBox procedure attempt to sculpt the probability distribution returned by the D-Wave OneTM System to be close to the Boltzmann distribution you'd get over the generating function. As the iteration proceeds, the probability distribution changes and, if everything goes well, starts converging to something close to the Boltzmann distribution described in the previous section.

What BlackBox returns is the best solutions found during this iterative procedure. These solutions can be used in two modes.

Often a developer is looking for the best solution found. In this optimization mode, the output of this procedure is the bit string BlackBox found that returned the lowest value of the generating function.

Sometimes a larger set of samples is desirable; we call this the sampling mode. For example, if the user wants to generate not only the best solution, but a set of 'good' alternate solutions, the system natively provides this capability - the user records the guesses generated at Step 4, the number of times these guesses were generated, and the quality of the guesses. BlackBox can be configured to return a set of the best answers found, and how many times each was seen.

Probabilities and energies in our Subset Sum Problem example

Let's take the specific example of the Subset Sum Problem instance we used earlier:

$G(x_1,x_2,x_3,x_4,x_5)=(-7x_1-3x_2-2x_3+5x_4+8x_5 )^2+\prod_{(k=1)}^5(1-x_k)$

Because we only have five bits here, we can explicitly write down all of the 32 terms we need to compute all of the terms in the Boltzmann distribution. It's a little tedious, but let's do it anyway!

Figure 2. Enumerating the function.

Notice that there is a single solution (highlighted in orange) that gives $$G(x_1,x_2,x_3,x_4,x_5 )=0$$: the solution with $$(x_1=0,x_2=1,x_3=1,x_4=1,x_5=0)$$.

Let's assume that the probability distribution over these results is a Boltzmann, and let's for the moment just arbitrarily set kT=1. We can then compute the actual probabilities for each state. These are the values in the rightmost column.

Now every time we measure the state of the system, we get one of the 32 states above with the probabilities in the righthand column (as long as the system is in thermal equilibrium at kT=1). If we wanted to find a solution to the Subset Sum Problem, given a physical system with this probability distribution, we would repeatedly sample from the probability distribution until we either found a solution or got bored of drawing samples. In this case, we see that there is a $$P_{SS}=39.3 %$$ (SS stands for "single shot") chance of drawing the interesting state $$(x_1=0,x_2=1,x_3=1,x_4=1,x_5=0)$$ per sample.

So if we draw a total of K samples, the probability of seeing the state we want at least once is

$P= 1-(1-P_{SS} )^K$

So if we drew say K=10 samples in this case, the probability of seeing the state we want at least once is

$P= 1-(1-0.393)^{10}=99.3%$