# Tutorials >

Using the D-Wave OneTM System to search for Hadamard matrices

Contents

Section 1

• 1.1 - Problem Definition
• 1.2 - Search, Optimization and Penalty Functions
• 1.3 - Building the Penalty Function
• 1.4 - First method: BlackBox Compiler
• 1.5 - Alternative method: Direct Mapping
• 1.6 - Gory Details: HOBO, QUBO, Chimera
• Aim, audience and required background

This material was developed to help those interested in the physics of the D-Wave OneTM System to perform some simple experiments and explore some of the interesting science behind the processor technology. This tutorial series is designed to introduce you to quantum computer programming using a practical, hands-on approach. By the end of this tutorial you should understand the way that a problem is crafted and sent to the hardware, and why the BlackBox compiler method of solving a problem is superior to the alternative (QUBO embedding/direct mapping).

All of the software necessary for developers to begin learning how to program D-Wave quantum computing systems can be downloaded from the DOWNLOAD DEVKITS page. As the developer portal is currently in BETA test stage, you will need to have a D-Wave developer account to download these packs. To follow this tutorial it may be helpful to have read two of the previous installments: Quantum Computing and Energy Programming Primer and Getting Started: Installing the D-Wave developer pack and an introductory 'hello multiverse' tutorial.

The high level programming language used in the included examples and source code is Python 2.6. Some experience of using the Python programming language is helpful but not required.

What you will learn

By following through the material in this white paper, you will learn:

• How finding a matrix with a certain set of properties can be cast as an optimization problem
• What a QUBO function is and why it is relevant to D-Wave's hardware
• How to map a QUBO function natively onto the hardware
• Why mapping natively in this way is a difficult and time-consuming process
• How the BlackBox compiler gives us a way around this mapping problem
• Software requirements

In order to proceed you should install several free software packages.

• Python - The material here assumes you are using Python 2.6. You can download Python from http://www.Python.org. If you need a programming development environment, Pycharm from Jetbrains http://www.jetbrains.com/pycharm/ is excellent. There is a small license fee to use it.
• NumPy - This is a scientific computing library. It is at http://numpy.scipy.org/
• Code for this tutorial (optional - i.e. if you want to see the full code files for each section). You can find the code referenced in this tutorial here: Github Repository for Matrix Search code. You will need a github account to access this code. Once you have a github account you can email devportal at dwavesys dot com and we can add you to the github D-Wave repositories.
• SECTION 1

1.1 - Problem Definition

The native strength of the D-Wave One system is solving discrete optimization problems. Here we will describe the process of solving one such model problem on the D-Wave One system. The problem is to find special kinds of matrices made up entirely of 0's and 1's and whose rows are orthogonal to one another.

In this context, two rows rA and rB of a matrix are orthogonal when the following condition is true. We visit each column of the matrix and look at the entry in rows rA and rB for that column. If the entries match (resp. do not match), we accumulate a score of 1 (resp. 0). If the accumulated score across all the columns is exactly equal to half the number of columns, we will call the two rows orthogonal.

To check your understanding of this concept, look at the following matrix:

$\left[ \begin{array}{ccc} 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \end{array} \right]$

There are four rows and consequently there are "four choose two" ways of forming a row pair, which means there are six row pairs in all. Of these six row pairs, all but one are orthogonal. Can you see which one row pair is not orthogonal?

Matrices with this special property are called Hadamard matrices after the French mathematician Jacques Hadamard. Examples of Hadamard matrices have been known since the late 1800's, at many different sizes. Since the orthogonality property requires that half the entries in a row pair match, the size of a Hadamard matrix must be even. With a few exceptions, Hadamard matrices have size 4Nx4N where N is a positive integer.

Finding a Hadamard matrix is a good model problem for the D-Wave One system. That's because the problem is naturally modeled in a way which allows us to re-express it in a way suited to the problem-solving capabilities of the system. We will describe two attempts at mapping the Hadamard problem to the D-Wave One system.

The first attempt can be characterized as a "manual" approach. In this approach, we attempt to translate directly the specifics of the search for a Hadamard matrix into a problem that can be handled by the hardware. This approach will acquaint us with many of the techniques used when mapping "real-world" problems to the D-Wave One system.

The second attempt is a higher-level approach in which we make use of the BlackBox compiler. This compiler hides much of the detail involved in mapping a problem instance to the system, exactly analogously to the way in which a compiler on a standard computer system shields us from the complexity of translating from a high level language into machine code.

Following this exercise should give you a concrete sense of problem solving strategies which are appropriate for the D-Wave One system.

1.2 - Search, Optimization and Penalty Functions

We can think of the problem of constructing a Hadamard matrix as a search problem. Each of the matrix elements of a 4Nx4N Hadamard matrix can assume one of two possible values, so the total number of $$4N \times 4N$$ matrices in our search space is $$2^{(16N^2)}$$. Even when N is small, the size of this search space is large.

Through a slight change in perspective, we can transform this search problem into an optimization problem. We will build a "penalty function" which is defined on the search space of all Hadamard matrices of a given size. This penalty function attains its minimum value at precisely that point (or points) in search space that correspond to valid solutions to our problem.

Our first task then will be to build a penalty function which is defined on the space of all 4Nx4N matrices. When each row pair of our input matrix obeys the orthogonality constraint, we want our penalty function to be at its minimum value. If our input matrix deviates from this orthogonality condition, our penalty function should evaluate to a number larger than the minimum value.

Simple intuition suggests that the D-Wave One system should be well adapted to solving this kind of problem. After all, each qubit of the D-Wave One system exists in a superposition of possible states. It is easy to imagine that we can assign each qubit of the D-Wave One system to an element of our matrix. If we wish to think in physics terms, the penalty function for our matrix becomes an energy function for the qubits of the underlying hardware.

1.3 - Building the Penalty Function

In building our penalty function, it helps to think of the matrix elements as boolean variables. The only real convenience this brings us is that we can refer to standard boolean functions (eg. NOT, OR, XOR, etc) as we build up our function. We can denote the problem variables as $$b_{i,j}$$, where i ranges over the rows of our 4Nx4N matrix and j ranges over the columns. Each $$b_{i,j}$$ will be either 0 or 1.

Our job is then to define a penalty function which maps the collection of problem variables $$b_{i,j}$$ to penalties. When the $$b_{i,j}$$ represent a valid Hadamard matrix, the penalty should be zero. When the $$b_{i,j}$$ represent a matrix which violates the orthogonality conditions at one or more row pairs, the penalty should be positive:

$$P(b_{i,j}) = 0$$ iff $$b_{i,j}$$ represents a valid Hadamard matrix

$$P(b_{i,j}) \gt 0$$ iff $$b_{i,j}$$ is not a valid Hadamard matrix

The remainder of this section contains the steps required to build the penalty function. If you are prepared to accept the form of the penalty function without reading these detailed steps, you may wish to skip to the end of this section and simply look at the form that we obtain for the penalty function.

No subtlety is required for the next step: our penalty function requires each row pair of our matrix to satisfy the orthogonality condition, thus we will begin by representing our penalty function as a sum of terms. Each term represents a row pair. The term should vanish when the row pair satisfies the orthogonality condition and be positive when the condition is violated.

$P(b_{i,j}) = \sum_{i \lt j} term(i,j)$

The next challenge is constructing a function which will vanish when a row pair is orthogonal and is positive otherwise. Remember that we are counting the number matches in corresponding columns between row i and row j. We want our penalty to vanish when the total number of matches is 2N.

To accomplish this, we use a simple trick. Suppose $$M(i,j)$$ is the number of column matches between rows i and j. We want our penalty to vanish when this number equals 2N and to be positive otherwise. A simple function that has this property is the following:

$(M(i,j) - 2N)^2$

When M(i,j) equals 2N, the difference (and hence its square) vanishes. And when M(i,j) is different from 2N, the difference will be non-zero and its square will be a positive number. Thus this expression has precisely the property that we are seeking in order to penalize rows which are not orthogonal: it vanishes when M(i,j) is equal to 2N and is positive otherwise. We can now take this expression to be the term(i,j) in our earlier sum:

$P(b_{i,j}) = \sum_{i \lt j} (M(i,j) - 2N)^2$

We now have to figure out how to compute M(i,j), which is the number of columns in which there is a match between row_i and row_j. At this point, our boolean functions are handy. The XOR(...) function has the following truth table:

 XOR(x,y) y = 0 y = 1 x = 0 0 1 x = 1 1 0

In other words, XOR(x,y) is equal to 1 when its two input arguments differ and is equal to 0 when its two input arguments match. The function 1 - XOR(x,y) has the opposite property: it is equal to 1 when its two input arguments match and 0 otherwise. We want to sum this function across the columns of a given row pair to obtain M(i,j):

$M(i,j) = \sum_c \left[1 - XOR \left( b(row_i, col_c),b(row_j, col_c) \right) \right]$

We can extract the constant term from this sum to get:

$M(i,j) = 4N - \sum_c XOR (b(row_i, col_c), b(row_j, col_c))$

We can substitute this into our expression for $$P(b_{i,j})$$ to get:

$P(b_{i,j}) = \sum_{i \lt j} {\left[ 4N - \sum_c XOR(b(row_i, col_c), b(row_j, col_c)) - 2N \right]}^2$

Combining the two constants allows us to simplify this to:

$P(b_{i,j}) = \sum_{i \lt j} {\left[ 2N - \sum_c XOR(b(row_i, col_c), b(row_j, col_c)) \right]}^2$

This penalty function has exactly the properties we require. When the $$b(i,j)$$ variables are initialized as for a Hadamard matrix, $$P(b_{i, j})$$ will evaluate to zero. When the input matrix is *not* a Hadamard matrix, then one or more of the row pairs will be non-orthogonal. For each of these row pairs, the corresponding term in sum over $$(i,j)$$ pairs will be positive, contributing to an overall positive value of the penalty function.

1.4 - First approach: BlackBox Compiler

There are many analogies between the BlackBox compiler - which we can think of as a compiler for the D-Wave One adiabatic quantum computer - and a classical compiler for a von Neumann-style machine. Both shield the user from having to understand and deal with complexities of the underlying machine's instruction set. Both aim to make problem expression much simpler. Both attempt to present a model of computation which abstracts away from the specific actions carried out by the hardware.

We cannot push these similarities too far - at least not at present. Traditional compilers are designed to parse specific languages: Lisp, Fortran, Basic, C, Pascal, etc. The BlackBox compiler is really a toolkit of library functions that allows one to handle a particular kind of optimization problem. It is certainly possible to imagine a growth path for BlackBox via which it will evolve into a complete language for specifying certain kinds of computations. For now it is important to understand exactly what BlackBox does for us. It finds the minimum of functions.

The beauty of compilers is that, through relying on a set of abstractions, a software developer can think of his computation as being carried out on some kind of virtual machine that presents facilities like variables, mathematical operations, objects, methods, etc. In reality a modern CPU understands nothing of any of these concepts. They deal in simple machine instructions: load a register from some memory location, combine two registers using an arithmetic operation, store a register's value to some memory location. The compiler provides all the layers of abstraction that allow the user to think in higher-level concepts.

In the same way, the BlackBox compiler allows the user to think of his or her problem as the minimization of some function - generally of discrete variables. The user provides to the compiler the specification of an input function along with the number of variables required by the input function to compute an output value. The compiler takes care of the rest.

What this means for us as we use BlackBox to search for Hadamard matrices is that we need to provide that penalty function developed earlier. BlackBox will "digest" the penalty function and search for values for its input variables that will minimize the penalty function.

Here is Python code that implements a penalty function of the form we crafted in the previous section, sends this penalty function to the BlackBox compiler, and checks that the output is indeed a Hadamard matrix:

Code Snippet:

import itertools
from numpy import sqrt, array
from dwave_sapi import local_connection, BlackBoxSolver
import time

class ObjClass(object):
# A class to hold our penalty function to allow it to be sent to BlackBox
def __call__(self, states, numStates):
states_bin  = [(item+1)/2 for item in states] # ------------ converting to 0/1 from -1/+1
stateLen = len(states)/numStates

ret = []
for state_number in range(numStates):
optimized_values_list = array(states_bin[state_number*stateLen:(state_number+1)*stateLen])
counter = 0
rows = []
N = int(sqrt(len(optimized_values_list)))

for i in range(N):
rows.append(optimized_values_list[(i*N):(i*N)+N])

for comb in itertools.combinations(rows, 2):
sum_of_elements = sum([abs(a - b) for a, b in zip(comb[0], comb[1])])
penalty = (sum_of_elements-(N/2))**2
counter += penalty

#            print counter, # Uncomment this to watch BlackBox go!!! Warning: It can get extremely verbose!
ret.append(counter)
return tuple(ret)

# Code to check that the matrix returned by BlackBox is a Hadamard matrix by
# printing the number of row mismatches 'errors'
rows = []
N = int(sqrt(len(bitstring)))
for i in range(N):
rows.append(bitstring[(i*N):(i*N)+N])
counter = 0
for comb in itertools.combinations(rows, 2):
sum_of_elements = sum([abs(a - b) for a, b in zip(comb[0], comb[1])])
penalty = (sum_of_elements-(N/2))**2
counter += penalty
return counter

#Here is the main BlackBox routine
matrix_dim = 12
num_vars = matrix_dim**2

solver = local_connection.get_solver("c4-sw_sample")

obj = ObjClass()
blackbox_parameter = 1
blackbox_solver = BlackBoxSolver(solver)
global_start_time = time.time()
blackbox_answer = blackbox_solver.solve(obj, num_vars, cluster_num = 2, min_iter_inner = \
blackbox_parameter, max_iter_outer= blackbox_parameter, \
unchanged_threshold=blackbox_parameter, \
max_unchanged_objective_outer=blackbox_parameter, \
max_unchanged_objective_inner = blackbox_parameter, \
unchanged_best_threshold = blackbox_parameter, verbose=1)

time_to_solution = time.time()-global_start_time
blackbox_answer_bin = [(item+1)/2 for item in blackbox_answer] # converting to 0/1 from -1/+1

print "The best bit string we found was:", blackbox_answer_bin
print 'The time to solution was: ', time_to_solution
print ''
for i in range(0,matrix_dim):

#Let's check if our matrix is Hadamard-ey

print ''
print 'row mismatches =', errors


Blackbox takes functions as class objects, so the first thing we do is write a class which holds our penalty function. The penalty function coding itself is rather simple. optimized_values_list just holds the value of the current state that Blackbox is 'considering'. So this is the state that we must use to evaluate the penalty. The state is held as a long list so we split it into N 'rows' where N is the square root of the length of this array. This assumes the matrix is square. We then check each entry in each combination of row-pairs for the orthogonality condition (i.e. the total number of mismatches should be equal to half the total number of columns). If the row-pair under consideration does not satisfy the orthogonality condition, we add 1 to the penalty counter. This quantity summed over row pairs is precisely our penalty function.

The extra function added to this code check_hadamard() just runs the orthogonality check over the final matrix that Blackbox returns, to see if it is indeed a Hadamard matrix. If mismatches = 0 then the code has done its job!

Here is a screenshot of the printout produced by this code snippet:

Figure 1. Printout of the code proudly displaying its trophy from the hunt: A 12x12 Hadamard matrix.

Under the covers, BlackBox implements an iterative loop to find a minimum of our penalty function. Each step of this iteration has two phases. In the first phase, a probability distribution over the set of possible inputs to the penalty function is determined. In the second phase, samples are drawn from this probability distribution. The iteration proceeds by using the sampled points to drive the generation of a new probability distribution. Each iterative step should drive the probability distribution closer and closer to the minimum of the penalty function.

BlackBox uses classical computing techniques to determine the probability distribution in each iterative step. The second phase in which samples are drawn according to the distribution takes advantage of the D-Wave One adiabatic quantum computing system. Thus BlackBox implements a hybrid algorithm which allows us to couple a traditional computer with a quantum computer.

Under the covers, BlackBox translates the probability distribution into the native "instruction" set of the D-Wave One system - the $$h_i$$ and $$J_{i,j}$$ constants that provide the biases for individual qubits and pairs of qubits. The D-Wave One system then returns a sample of inputs according to this probability distribution. This is the quantum portion of the iterative loop.

Hang on a minute... 12x12 = 144!

Observant readers may have noticed that finding a 12x12 Hadamard matrix requires finding 144 matrix entries. Yet there are only 128 qubits on a Rainier chip! How is this possible?

Blackbox can handle problems that are larger than the number of available qubits both by using techniques such as local search in an intelligent way. Cutting to the chase, we can use this BlackBox approach to solve Hadamard problems of several different sizes. As we will see in the next section, if we do not use Blackbox we are much more limited and must use qubits in a different way, meaning that we need substantially more qubits than the Rainier system offers in order to even handle the 4x4 Hadamard problem. The more flexible way in which BlackBox uses the D-Wave One hardware allows us to solve problems past the 4x4 size.

Figure 2. Hadamard matrices found by the BlackBox solver

Here is a 4x4 Hadamard matrix found by the BlackBox solver:

$\left[ \begin{array}{ccc} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right]$

It is easy to check that each of the six row pairs in the above matrix set of Hadamard matrices found by the BlackBox solver.

1.5 - Alternative method: Direct Mapping

Our next attempt to make use of the D-Wave One system for this problem will serve to illustrate why the Blackbox method (or any compiler method) is superior for most programmers. There are times, however, when coding at a lower level of abstraction may be necessary. There are therefore two ways in which the material of this section can be appreciated. First, for the reader who wishes to directly map a problem to the D-Wave One hardware (perhaps a prgrammer who feels they may be able to make their program more efficent by doing so), this section can serve as an introduction to several different challenges that will have to be overcome. Those interested in doing this may wish to attend very carefully to the steps detailed here. Second, for the reader who wishes to evaluate what advantage is gained by using the BlackBox compiler, this section demonstrates the difficulties of direct mapping. Readers interested in learning about the advantage may choose to skim this section quickly, simply to get an appreciation for the complexity of the direct mapping approach.

Our first step in this process is to identify the boolean variables $$b_{i,j}$$ of the Hadamard matrix with qubits. This requires that we also mimic the penalty function of the boolean variables with the energy function of the qubits. This latter step will force us to confront the specific computation carried out by the D-Wave One system.

When attempting direct mapping (also known as 'embedding'), we must be intimately familiar with the capabilities of the D-Wave One hardware. This is quite analogous to the programmer of a traditional von Neumann computer for which there is no compiler. The programmer will have to learn the specific instruction set of the computer and translate his problem into instructions which can be executed natively by the hardware.

Modern programmers rely on compilers for many reasons: 1) compilers shield us from the complexities of the instruction set 2) compilers allow us to focus our mental energy on algorithms rather than hardware issues 3) compilers provide libraries of useful functions. The drawback of relying on compilers is that we do not learn as much about the underlying hardware as when we write instructions for the hardware ourselves. This is perhaps another good motivation for reading this section: it will give us a much better appreciation for what the D-Wave One system does as it computes.

Now for our direct mapping problem. We have already chosen to identify the boolean variables of our problem with the qubits of the system. This step is easy. The second step is hard: we need to mimic the penalty function that we have written down in the prior section with an energy function that can be computed by the D-Wave One system.

The energy function is more properly thought of as a Hermitian operator which operates on states in a Hilbert space. We can choose basis states in this Hilbert space that diagonalize a set of commuting operators: the operators that measure the z-component of spin at a particular qubit. In the physics literature, the Hermitian operator is called a Hamiltonian. The D-Wave One system can be described by a Hamiltonian which can be written in terms of operators on this Hilbert in the following way:

$H(s) = \sum_i h_i s_i + \sum_{i,j} J_{i,j} s_i s_j$

The D-Wave One system will find low energy states of this Hamiltonian through a process called adiabatic quantum annealing. We can think of the numbers $$h_i$$ and $$J_{i,j}$$ that appear in this Hamiltonian as the "instruction set" of our quantum computer. Our task is now to make the transition between the penalty function written at the end of the prior section and this Hamiltonian.

1.6 - Gory Details: HOBO, QUBO, Chimera

To map from our Hadamard problem to the D-Wave One hardware, we'll pass through a set of stages which we can informally think of as HOBO, QUBO and Chimera. The first stage involves stating our Hadamard matrix problem in the language of Boolean optimization. This HOBO problem (see the next paragraph for an explanation of the HOBO acronym) cannot fit on the D-Wave hardware, so we transform it into a problem containing simpler interactions. This is a QUBO problem, it represents the second stage along this path and it is closer to the D-Wave hardware but may still require "instructions" that the D-Wave hardware does not support. The D-Wave hardware implements a specific topology connecting its qubits, which we will call Chimera. We can think of this topology as defining the real "machine instructions" implemented by the hardware. The third stage of our problem is mapping from QUBO to Chimera.

HOBO to QUBO

Our penalty function is an example of a Higher Order Binary Optimization problem, or a HOBO for short. Binary optimization can be handled by the D-Wave One Hamiltonian, but higher order terms in the penalty function cannot. That's because the $$J_{i,j}$$ "instructions" that appear in our Hamiltonian can only represent quadratic terms in the penalty function: terms that contain at most a product of the $$b_{i,j}$$ variables. Our penalty function contains both cubic and quartic terms in the $$b_{i,j}$$, so our first task is to reduce these higher order interactions with quadratic ones. This will require the introduction of ancillary variables.

To see how this works, we'll need to write out the parts of the penalty function in detail. Our penalty function is a sum over row pairs and the contribution from each one of these row pairs is identical in form. This means that we can analyze a single row pair and apply the results of this analysis to all row pairs uniformly. Let's look at the contribution from the row pair combining $$row_i$$ and $$row_j$$:

$term(i,j) = {\left[ 2N - \sum_c XOR(b(row_i, col_c), b(row_j, col_c)) \right]}^2$

This sum over columns represents 4N terms, each of which is a product of two XOR functions. We'll need to perform the sum followed by the square and keep track of the different kind of terms that arise. It is this product that will give rise to the higher order terms in our penalty function.

The first kind of term appearing in term(i,j) is the square of each individual term in curly braces. We can write these as follows:

$term_{sq}(i,j) = (2N)^2 + \sum_c {\left[ XOR(b(row_i,col_c),b(row_j,col_c)) \right]}^2$

The second kind of term appearing in $$term(i,j)$$ is the cross product of two different terms in curly braces. These can be broken down further into products in which one of the terms is 2N and product in which neither of the terms is 2N. The first gives rise to:

$term_{cr1}(i,j) = 2 \times \sum_c 2N \times (\left[ -XOR(b(row_i,col_c),b(row_j,col_c)) \right]$

A factor of two appears in front of this expression since each cross product can be formed two ways. The second kind of product is one in which both members are themselves XOR functions computed at different columns. In this case, the sum over columns c becomes a sum over column pairs. We'll denote the first (resp. second) column in the column pair by $$col_1$$ (resp. $$col_2$$ ). This portion of the cross product becomes:

$term_{cr2}(i,j) = 2 \times \sum_{col_1 \lt col_2} XOR(b(row_i,col_1),b(row_j,col_1)) \times XOR(b(row_i,col_2),b(row_j,col_2))$

We have re-expressed term(i,j) as a sum of these three parts:

$term(i,j) = term_{sq}(i,j) + term_{cr1}(i,j) + term_{cr2}(i,j)$

Boolean expressions assume the value 0 or 1 only. These two numbers share the property that they each equal their own square, so we can replace the square of any individual boolean expression by the expression itself. Removing the "square" like this allows us to simplify the form of the penalty function. It also allows us to reduce some of the higher order interactions to quadratic without needing to introduce ancillary variables.

Let's apply this observation to $$term_{sq}(i,j)$$. This contains a sum over columns of the square of an XOR expression, which assumes the values 0 or 1. We can thus remove the square operation without affecting the values of $$term_{sq}(i,j)$$. This gives us:

$term_{sq}(i,j) = {(2N)}^2 + \sum_c XOR(b(row_i,col_c),b(row_j,col_c))$

This simplified form of $$term_{sq}(i,j)$$ contains the same sum over columns of an XOR expression as appears in $$term_{cr1}(i,j)$$. Combine these two terms to get:

$term_{sq}(i,j) + term_{cr1}(i,j) = {(2N)}^2 + (1-4N) \times \sum_c XOR(b(row_i,col_c),b(row_j,col_c))$

Unfortunately, we cannot apply the same simplification to $$term_{cr2}(i,j)$$ because it does not contain the square of any single boolean expression.

Ancillary variables

Now we need to expose the higher order (ie. non-quadratic) portions of our penalty function and introduce ancillary variables to reduce the higher order interactions to quadratic ones. To do this, we can re-express the XOR function in terms of its two inputs. This is easy once we observe that XOR(x,y) is equal to x+y-2xy:

 (x,y) XOR(x,y) x+y-2xy x = 0, y=0 0 0 x = 0, y=1 1 1 x = 1, y=0 1 1 x = 1, y=1 0 0

Wherever we see an XOR function in the penalty function, replace it with this simple function of the two inputs of the XOR function. Each XOR function that makes up term(i,j) gets replaced with a sum of some linear terms and a quadratic term in its inputs.

Apply this replacement for XOR(x,y) to the this portion of the penalty function:

$term_{sq}(i,j) + term_{cr1}(i,j) = {(2N)}^2 + (1+4N) \times \sum_c \left[ b(row_i,col_c)+b(row_j,col_c)- \left( 2b(row_i,col_c) \times b(row_j,col_c) \right) \right]$

This portion of the penalty function is now in exactly the desired form. It contains a sum of a constant term, linear terms in the $$b_{i,j}$$ boolean variables, and quadratic terms consisting of products of one $$b_{i,j}$$ variable with another $$b_{i',j'}$$ variable. This can be handled directly by the D-Wave One hardware. We can make appropriate choices of the $$h_i$$ and $$J_{i_j}$$ constants to mimic this penalty function via the Hamiltonian that governs our qubits.

Unfortunately, this same simplification cannot be applied to the $$term_{cr2}(i,j)$$ portion of our penalty function. The $$term_{cr2}(i,j)$$ contains products of two XOR functions. These products will give rise to a cubic term in the input variables coming from product of the linear terms from one XOR function with the quadratic term from the other XOR function. The product of the quadratic terms from the two XOR functions will give a quartic term in the input variables.

Now we will work out the contribution to term_cr2(i,j) from a pair of columns c1 and c2. There is a bit of tedious algebra here, but remember that this is necessary to expose the higher order interactions that the D-Wave One hardware cannot handle directly.

$term_{cr2}(i,j;c1,c2) = ( b(ri,c1) + b(rj,c1) - 2b(ri,c1) \times b(rj,c1) )$

$\times ( b(ri,c2) + b(rj,c2) - 2*b(ri,c2)*b(rj,c2) )$

$= b(ri,c1) \times b(ri,c2) + b(ri,c1) \times b(rj,c2) - 2b(ri,c1) \times b(ri,c2)\times b(rj,c2) +$

$b(rj,c1) \times b(ri,c2) + b(rj,c1) \times b(rj,c2) - 2b(rj,c1) \times b(ri,c2) \times b(rj,c2) +$

$-2b(ri,c1) \times b(rj,c1) \times b(ri,c2) - 2b(ri,c1) \times b(rj,c1) \times b(rj,c2) + 4b(ri,c1) \times b(rj,c1) \times b(ri,c2) \times b(rj,c2)$

Of these nine terms, four contain products of three input variables and one contains a product of four input variables. These are precisely the cubic and quartic terms that cannot be handled directly by the D-Wave One "instruction set".

The technique for eliminating these cubic and quartic terms is to introduce ancillary variables. The benefit of this approach is that we can re-write our penalty function using purely quadratic interactions in the enlarged set of variables, which means that we can express our problem within the "instructions" implemented by the D-Wave One hardware. The drawback of this approach is that we now have additional variables to handle, which means that we are using a valuable hardware resource - additional qubits. There is one additional implication of this approach which we will see later on: a limiting procedure is required to make sure that these ancillary variables attain their desired values.

The general methodology for introducing ancillary variables is to express products of existing variables as a single new replacement variable. This replacement variable can assume any value, so we will introduce a new penalty term which will vanish only when the replacement variable is exactly the product of the existing variables which it replaces.

To see how this works, imagine replacing x*y with a new variable z. We want z to equal the product x*y, so we will penalize those combinations of x,y,z for which z is not equal to x*y. We need to find a function which will provide the necessary penalty and which is linear and quadratic in x,y,z. That will enable us to implement this penalty term in the "instructions" available in the D-Wave One hardware. Here is a penalty function that satisfies these requirements:

 (x,y,z) xy z valid xy-2xz-2yz+3z x = 0, y=0, z=0 0 0 yes 0 x = 0, y=0, z=1 0 1 no 3 x = 0, y=1, z=0 0 0 yes 0 x = 0, y=1, z=1 0 1 no 1 x = 1, y=0, z=0 0 0 yes 0 x = 1, y=0, z=1 0 1 no 1 x = 1, y=1, z=0 1 0 no 1 x = 1, y=1, z=1 1 1 yes 0

If we re-examine term_cr2(i,j;c1,c2) and look for ways to reduce higher order interactions, we can see that the following product of variables appears in two cubic terms and the quartic term:

$b(ri,c2) \times b(rj,c2)$

Replacing this product with a single ancillary variable reduces the order of the two cubic terms in which it appears to quadratic. The quartic term now becomes cubic. A second product of variables appears in the two remaining cubic terms and also in the newly reduced cubic term:

$b(ri,c1) \times b(rj,c1)$

This second replacement reduces all three remaining cubic terms to quadratic. Let's give these two variables explicit names:

$b(ri,c2) \times b(rj,c2) \to x(ri,rj,c2)$

$b(ri,c1) \times b(rj,c1) \to x(ri,rj,c1)$

Using these two new replacement variables, we can express $$term_{cr2}(i,j;c1,c2)$$ like this:

$term_{cr2}(i,j;c1,c2) = b(ri,c1) \times b(ri,c2) + b(ri,c1) \times b(rj,c2) - 2*b(ri,c1) \times x(ri,rj,c2) +$ $b(rj,c1) \times b(ri,c2) + b(rj,c1) \times b(rj,c2) - 2b(rj,c1) \times x(ri,rj,c2) +$ $-2x(ri,rj,c1) \times b(ri,c2) - 2x(ri,rj,c1) \times b(rj,c2) + 4x(ri,rj,c1) \times x(ri,rj,c2)$

All cubic and quartic terms are gone, having been replaced with quadratic terms. The cost for this reduction in order is that we have to introduce new penalty terms to make sure that the x(ri,rj,ck) variables assume their correct values. For each row power row_i and row_j and each column col_k, there is a new term to add to the penalty function:

$term_x(ri,rj,ck) = b(ri,ck) \times b(rj,ck) - 2b(ri,ck) \times x(ri,rj,ck) - 2b(rj,ck) \times x(ri,rj,ck) + 3x(ri,rj,ck)$

A counting exercise is in order at this point. It would be nice to see how many ancillary variables we have introduced to transform our HOBO problem into a QUBO problem. In dealing with the row pair consisting of row_i and row_j, we needed to introduce an ancillary variable for each column. In the case of the 4x4 Hadamard matrix, this means that for each of the six row pairs, we will introduce four new ancillary variables. To our original sixteen problem variables we will add 24 ancillary variables, for a total of 40 boolean variables. When we consider a 4Nx4N problem, the number of ancillary variables will be:

$(4N choose 2) \times 4N = 4N(4N-1)4N/2 = 8(4N-1)N^2$

The number of ancillary variables grows as the cube of N while the number of native problem variables grows as the square of N. Clearly this approach will not scale to large values of N unless we have many, many qubits available to handle the ancillary variables.

The penalty function at the QUBO stage of the problem now contains the original penalty function, re-expressed using the ancillary variables, plus the new penalty terms which are required to force the new variables to assume their proper values relative to the original problem variables. Here it is:

$P(b_{i,j};x_{i,j,k}) = (4N choose 2) \times {(2N)}^2 +$ $(1-4N) \sum_{i \lt j} \sum_c ( b(row_i,col_c) + b(row_j,col_c) - 2b(row_i,col_c) \times b(row_j,col_c) ) +$ $(1-4N) \sum_{i \lt j} \sum_{c1 \lt c2} term_{cr2}(i,j;c1,c2) + w \times \sum_{i \lt j} \sum_c term_x(i,j,c)$

The dependence of P on the $$b_{i,j}$$ and $$x_{i,j,k}$$ is quadratic at most. We have introduced the one new parameter w to control the "strength" of the penalty incurred when we violate the constraint which forces the x(i,j,k) variables to have their correct values.

QUBO to Chimera

The form of our penalty function has gotten rather complicated. It started as a simple sum over row pairs in which each row pair contributed a "deviation" from orthogonality. If the sum of all these deviations vanished, then we knew that all row pairs were orthogonal and we had found a Hadamard matrix.

Converting our HOBO problem into QUBO format introduced ancillary variables and additional penalty terms. We also have to contend with sums over not just pairs of rows, but also sums over pairs of columns. It would be helpful to think of this more complicated function in a simple way. It turns out that this is possible - and in fact we can draw simple pictures that illustrate where we are now and where we need to go in order to finish the mapping from QUBO to Chimera.

The simple way to think about our QUBO problem is as a matrix sandwiched between two vectors. We multiply our matrix on the right by a column vector and on the left by the same vector, transposed to form a row. Here is an example of a 3x3 matrix:

$Q(x,y,z) = (x \hspace{2mm} y \hspace{2mm} z) \left[ \begin{array}{ccc} a & d/2 & e/2 \\ d/2 & b & f/2 \\ e/2 & f/2 & c \end{array} \right] \left[ \begin{array}{ccc} x \\ y \\ z \end{array} \right]$ $= ax^2 + by^2 + cz^2 +dxy + exz + fyz$

In this example, $$Q(x,y,z)$$ is a function of the boolean variables x, y and z. Remembering that the square of a boolean variable is equal to itself, we can re-write $$Q(x,y,x)$$ as:

$Q(x,y,z) = ax + by + cz + dxy + exz + fyz$

Neglecting an overall constant term, this is the most general quadratic function that we can write of these three boolean variables.

We can imagine our QUBO penalty function in exactly the same way. First, we assemble our $$b_{i_j}$$ and $$x_{i,j,k}$$ variables into a column vector and a corresponding transposed row vector. Second, we find each linear term containing a single variable and place its coefficient on the diagonal of our square matrix. Third, we find each quadratic term containing the product of two variables. For each such term, we split the coefficient in half and place the two halves into our matrix at row/column locations corresponding to the intersection of these two variables. The matrix product, sandwiched between the column vector and its transpose, is exactly equal to our QUBO-ized penalty function, neglecting the overall constant term.

The advantage of writing our QUBO problem in this form is that we can easily visualize the pattern of zero and non-zero elements in our matrix. This turns out to be essential for the last transition: turning our QUBO problem into a Chimera graph. Take a moment to look at Fig. 1 which illustrates this concept.

Figure 3. Block Structure of QUBO

For our 4x4 problem we use sixteen variables to represent elements of the Hadamard matrix and twenty four more ancillary variables. Collect these into a forty element column vector with the matrix element variables first and the ancillary variables following. The location of these variables in the column (and transposed row) vector determines the pattern of entries in the QUBO matrix, which has dimensions 40x40. It is clear from the figure that slightly more than half (55%) of the matrix has no coupling between variables and slightly less than half (45%) of the matrix contains non-zero couplings between variables.

An alternative way to visualize this same information is presented in Fig. 2. This diagram makes explicit each of the 40 variables - both original and ancillary. Each circle contains four dots - for the four variables $$b_{i*}$$ associated with a row of the Hadamard matrix or for the four ancillary variables $$x_{i,j*}$$ associated with a row pair. All variables within a circle interact (quadratically) with the other variables in the same circle. Additionally, variables in some circles interact with variables in other circles. The pattern of interactivity represented here is exactly the same as that in Fig. 1.

Now, compare these diagrams with the pattern of interconnectivity of the qubits in the D-Wave One system, as shown in fig 3. This diagram represents each qubit as a graph vertex and each potential non-zero $$J_{i,j}$$ term in the Hamiltonian as a link between graph vertices. Figure 5 shows the exact pattern of connections between the 128 qubits in the Rainier chip. There are 352 coupling terms available for controlling the influence that one qubit has on another. Our task is to map the 40 problem and ancillary variables of our 4x4 Hadamard problem to the 128 qubits of the Rainier graph.

Figure 5. Chimera 4x4

What are the constraints that we must satisfy when we define this mapping? First, each of our forty variables must be mapped to at least one distinct qubit, otherwise the hardware can tell us nothing about that variable's value in the solution! We may actually map one of our problem variables to more than one qubit in order to satisfy the second constraint: if two problem variables interact with one another (ie. there is a term in the penalty function that contains the product of these two variables) then there must be a coupler in the hardware that connects the two qubits to which these variables are mapped. This is generally a difficult constraint to satisfy since the pattern of connectivity between our problem variables has no a priori reason to match the pattern of connectivity amongst the qubits in the D-Wave One system.

The general technique for solving this problem is to map one problem variable to several qubits in the hardware. We introduce strong couplings between these qubits so that they will all attain the same value in the ground state. This allows us to think of this set of hardware qubits as representing a single "logical" (or collective) qubit. The benefit of this approach is that by spreading a single problem variable over multiple hardware qubits, we may have more opportunities to find a coupler in the hardware that will allow this one problem variable to influence another problem variable.

An example may help. Figure 6 illustrates the notion of "Collective Qubits". This means using multiple hardware qubits to mimic the action of a single problem variable. The figure shows three collective qubits, each of which is comprised of eight hardware qubits. Each collective qubit is represented as a gray strip connecting its eight hardware qubits. The couplers within the gray strip are required to force all hardware qubits within a collective qubit to have the same value. The couplers that allow the collective qubits to interact with each other are shown in bold.

Figure 6. Collective qubits

The benefit of forming collective qubits is that we can more easily introduce interactions between them than between individual hardware qubits. In the Chimera topology, each hardware qubit connects to precisely six other hardware qubits. Of these six neighboring qubits, four are "nearby" and two are slightly more distant. This pattern defines the interactions that are supported by the hardware "instruction set". If the pattern of interactions we need to implement fits these "instructions", we are in luck. Otherwise we need to solve what is called the "graph embedding problem". Collective qubits provide a simple way to think about the graph embedding problem.

There are many ways that one can form collective qubits from the hardware qubits of the D-Wave One system. One possibility is hinted at by Figure 6. It shows three collective qubits, each of which is made of a column of four hardware qubits and a row of four hardware qubits. We can continue this pattern across the entire graph, creating a total of sixteen collective qubits, each comprising eight hardware qubits. The benefit of this set of collective qubits is that each of the sixteen interacts with every other collective qubit. In graph terminology, this is referred to as a complete graph on sixteen vertices. With this approach, any QUBO problem defined over sixteen problem variables can be handled by these collective qubits. The simplicity of this approach is accompanied by a drawback: we have had to reduce the number of available qubits from 128 (the total number of qubits in the Rainier topology) to 16.

We need a total of 40 problem variables to handle our 4x4 Hadamard graph. If we adopt this particular set of collective qubits, the Rainier topology will only provide us 16 collective qubits, so we have more than exhausted our hardware resources. A future possible topology would double the size of the Rainier topology in each dimension (see Figure 7). If we used the same strategy in this topology we could build a total of 32 collective qubits. This is almost - but not quite - enough qubits to allow us to handle the 4x4 Hadamard problem.

Figure 7. Possible Future Topology

The moral of this portion of the story must be stated here so that we can change gears and examine the second path for mapping our Hadamard matrix problem to the D-Wave One system. Simply put: direct mapping is challenging. We have seen that the first step along this path is to formulate our problem as a HOBO (Higher Order Binary Optimization). This step is necessary whether we use direct mapping or the BlackBox compiler. With direct mapping, we must introduce ancillary variables to reduce the higher order interactions to quadratic. Finally, we must form collective qubits and map the problem variables to them so that each pair of problem variables that needs to interact maps to collective qubits that share a coupler. Or, we can choose to try the BlackBox compiler. If we do this, some - but not all - of the work we did to express our Hadamard problem as a HOBO is still required. The work we did to translate our HOBO problem into a QUBO problem is no longer needed. And there is no need to map the QUBO problem into the Chimera topology.