# Quantum binary classification

Binary classification using a D-Wave OneTM system

Contents

• 1.1 - Introduction To Binary Classification
• 1.2 - Labeled training data
• 1.3 - Weak Classifiers
• 1.4 - Wisdom of The Crowds: Strong Classifiers
• 1.5 - The Loss Function
• 1.6 - Regularization
• 2.1 - Using a call to the D-Wave compiler to solve the problem
• 2.2 - Testing the classifier with the "test" data
• 2.3 - A simple GUI for offline mode classifier use
• Aim, audience and required background

This material was developed to help software developers interested in learning how to program D-Wave quantum computing systems get started.

This white paper is a practical introduction to developing applications on a D-Wave One system. You will learn by example, write a few real programs, and implement your code all the way through execution on a real quantum computer. Because much of the underlying theory and concepts are very difficult, we don't spend a lot of time on theory and instead focus on practical issues of how to get an application running.

Experience in machine learning (in particular classification) and discrete optimization is helpful but not necessary. The programming language used in the included examples and source code is Python 2.6, and use is made of the Natural Language Tool Kit (NLTK), which can be downloaded from http://www.nltk.org/ and is introduced in Natural Language Processing with Python available at http://www.nltk.org/book. While the example we will go through involves basic concepts in natural language processing (NLP), no background in NLP is required. No previous exposure to programming D Wave quantum computing systems is necessary. A helpful companion document, which introduces a more complex but more useful version of the algorithm described here, is "Training a Large Scale Classifier with the Quantum Adiabatic Algorithm", which you should download from http://arxiv.org/abs/0912.0779.

What you will learn

By following through the material in this white paper, you will learn:

• How supervised binary classification works
• How to craft a quantum version of a binary classification algorithm (QBC)
• How to run this algorithm using the D-Wave One system and the D-Wave developer kit
• Why good feature selection is important in machine learning
• The disadvantages of using supervised learning approaches with user-defined features
• Software requirements

In order to proceed you should install several free software packages.

• Python - The material here assumes you are using Python 2.6. You can download Python from http://www.Python.org. If you need a programming development environment, Pycharm from Jetbrains http://www.jetbrains.com/pycharm/ is excellent. There is a small license fee to use it.
• NumPy - This is a scientific computing library. It is at http://numpy.scipy.org/
• D-Wave Python developer pack - This contains Python modules for interfacing with the D-Wave One. The D-Wave Python developer pack can be downloaded from the DOWNLOAD DEVKITS page. You will need to sign in with a D Wave developer portal account).
• NLTK - The code here uses NLTK version 2.0 to supply a corpus of names for our training data. You can get it from http://www.nltk.org/
• wxPython - If you would like to implement the code to create the GUI described in section 2.3, you will need the wxPython graphics library. it can be found here: http://www.wxpython.org/
• SECTION 1

1.1 - Introduction To Binary Classification

In this tutorial we will describe an algorithm, called quantum binary classification (QBC), which is a supervised machine learning technique which can be used to train any binary classifier. A binary classifier is software that inputs an object, such as an image, document, or sound file, and labels it with one of two possible labels - for example, "yes" or "no". The "yes" or "no" response can be to any question you can think of pertaining to the type of object in question. The QBC algorithm was co-developed with researchers at Google and Purdue University, and was first used for building a classifier for labeling the presence or absence of cars in images.

To make everything concrete, we run through a fully worked example, where we use the QBC algorithm to train a binary classifier that inputs a person's first name, and classifies it as either more likely to be male or female. This example was chosen because it's fun, simple, and exhibits the basic features of more complex binary classification problems.

Imagine you are given the task of creating code that will input some (potentially very complex) object, and automatically label it with one of two possible labels. Here are some examples:

• Given an image, is there an apple in the image or not?
• Given a chest x-ray, is there a tumor in the image or not?
• Is a piece of code malware or not?
• Given a movie review, is it positive or negative?
• Given only a person's first name, is the person more likely to be male or female?

• How can we go about building the software that will apply these labels? There are many ways that you could try to do this. One of these, which we'll focus on here, is a basic machine learning approach called supervised binary classification.

This approach requires you to prepare two (and only two!) things before you can implement a classifier. These are labeled data and a set of weak classifiers (features). In this section we'll review both of these, briefly introduce some basic machine learning concepts, and present the discrete optimization problem at the core of the QBC algorithm.

1.2 - Labeled training data

If we want a machine to be able to learn to "recognize" something, one way to proceed is to present the machine with large numbers of objects that have been labeled (usually by humans) to either include the thing we are looking for or not, together with a prescription allowing the machine to learn what features in the object correlate with these labels.

For example, if we want to build code to detect whether an apple is in an image, we could generate a large set of images, all with apples in them, and label these images with the word "apple". We could also generate a large set of images, none of which had apples in them, and label each of these "no apple". A good machine learning algorithm might then detect a correlation between having red in an image, and having an "apple" label. The algorithm could learn that images that contain red things are more likely to contain apples, based on the examples we've shown it.

This type of approach, which depends on having large numbers of properly labeled examples, is called supervised machine learning.

The notation we will use is that the potentially very complex object, to which the label is assigned, is denoted x. x can be any representation of the object that you feel makes sense, given the type of object you are dealing with. Sometimes it makes sense to think of x as a vector of real numbers. For example, if you are dealing with images, x could be the values of the pixels, or the values of color histograms, or the coefficients of some type of wavelet transform. If you are dealing with natural language, as we will be in the example we'll build here, x is often a string.

The label that has been assigned to that object we will denote y. We will use the convention that y = ± 1, with these two different labels referring to the two different possibilities we are looking for our binary classifier to select.

Every item in our labeled data will be a pair of the form (x, y). The total number of items we have access to we'll call N. If we want to build a system that learns to classify the most likely gender of a person's first name, labeled data could look like this:

$(x_1, y_1) = ('Mohammad',-1)$ $(x_2, y_2) = ('Suzanne',+1)$ $(x_3, y_3) = ('Geordie',-1)$ $.$ $.$ $.$ $(x_N, y_N) = ('Erin',+1)$

In this example the x variables are strings holding a name, and the y values hold the most likely gender for that name, encoded so that +1 means 'female' and -1 means 'male'.

Note that while this example looks fairly simple, in general you can put anything you like in the x slot. If you were building a binary classifier that labeled a novel as to whether it was written by Dean Koontz or not, you might put the entire text of the novel in this slot. If you were interested in labeling images, x could be the pixel values in an image.

Once we have access to this labeled data, we need to perform a set of simple operations on it in order to proceed. These operations separate the available data into three separate groups, which are called the training set, validation set and test set.

The training set is the set of data that you will use to build your classifier; it is the subset of your data that the algorithm uses to learn the difference between objects labeled +1 and objects labeled -1.

The test set is the set of data that you will use to test how well the best classifier you found could be expected to perform "in real life", on examples it hasn't seen yet.

There are procedures that have been developed for how to best slice your available labeled data into these categories. In the example we'll implement here, we won't use these. If you are going to build industrial strength classifiers you will need to take some care in how you segment your available data, and you will encounter issues that need some thought to resolve.

Because we're focusing on showing the basic principles here, we'll just use a very simple procedure for segmenting our data.

Code Snippet 1: Obtaining training data from NLTK corpus

import random
import cPickle as pickle
from nltk.corpus import names

male_names = names.words('male.txt')[0:2900]
female_names = names.words('female.txt')[0:2900]

male_names_labeled = ([[name,'-1'] for name in male_names])
female_names_labeled =([[name, '+1'] for name in female_names])

training_data_labeled_shuffle = male_names_labeled + female_names_labeled
random.shuffle(training_data_labeled_shuffle)
mystr = pickle.dumps(training_data_labeled_shuffle)

f = open('data_randomized.txt', 'w')
f.write(mystr)
f.close()



This loads 5800 names in total (2900 male and 2900 female) from the NLTK 'names' corpus. Note that if you haven't installed NLTK you will need to do this so that the code can load these examples. You can see in the above how the training data points get created - male names are allocated a "-1" tag as the second entry in a list, and female names get a "+1". We save the randomized data to a file so that we can be sure that we are always using the data in the same order each time instead of re-randomizing the corpus. This is useful for debugging.

Next we'll write a function that loads in a user-defined number of training examples from our file of randomized, labeled names.

import cPickle as pickle

def get_training_data(training_data_size):
f = open('data_randomized.txt', 'r')
f.close()
training_data = training_data[0:training_data_size]
return training_data


If you like, you can run this function and interrogate that the training data is as you expect, by adding a line that prints out a section, e.g. print training_data[0:20] of the returned list.

1.3 - Weak Classifiers

A weak classifier is a function $$F(x)$$ that is applied to the x objects in your data, which returns either +1 or -1. Weak classifiers are sometimes referred to in machine learning as features. You need to provide these to the QBC learning algorithm.

These functions can be as simple or as complex as you like; the only restrictions on them is that they have to take as input objects of type x and output +1 or -1.You can think of the returned number as a "guess" that the weak classifier is making as to the correct label for the object. We will denote the kth weak classifier as $$F_k (x)$$, where k is an index that runs from 1 to D, where D is the total number of weak classifiers we wish to use. The set of all $$F_k (x)$$ we will refer to as the dictionary of weak classifiers.

Selecting weak classifiers that "guess" well can be very hard, especially if you don't have good intuition for what structures in your data are correlated with your labels. Getting a good set often involves trial and error and experimentation. Often a particular type of input object will have a lot of established prior art in what types of weak classifiers should be good starting points. Sometimes you will have to guess at what a good set might look like.

For our gender classifier, we want to set up the functions $$F_k (x)$$ so that they make guesses at what gender the name stored in the x variable might be. One type of feature that might work is the last letter of the name - names ending in 'a' might be more likely to be female, while those ending in 'n' might be more likely to be male.

If we go with this idea, we can define a dictionary of weak classifiers as follows:

Code Snippet 3: Weak classifier constructs

classifiers = [
['female','a'],
['female','b'],
['female','c'],
['female','d'],
['female','e'],
['female','i'],
['female','s'],

['male','a'],
['male','b'],
['male','c'],
['male','d'],
['male','e'],
['male','i'],
['male','s'],
]



You can choose as many weak classifiers as you like to include in your set. I just picked a small set (14) from the full alphabet of last letters in order to make the code run faster. An industry-strength classifier may be built up from millions of weak classifiers.

Code Snippet 4: Weak classifier function

def classify_a_name(weak_classifier, name):
if weak_classifier[0] == 'female':
if name[-1:] == weak_classifier[1]:
label = '+1'
else:
label = '-1'
return label
elif weak_classifier[0] == 'male':
if name[-1:] == weak_classifier[1]:
label = '-1'
else:
label = '+1'
return label



For example, the classifier $$F_5(x)$$created by passing ['female', 'e'] in to the function above outputs +1 (a guess that the name has gender 'female') if the last letter of x is 'e', and -1 (a guess that the name should be classified as 'male') if it's not. Recalling our earlier data, this is how such a classifier would guess on the following names:

 Training datapoint (x,y) $$F_5 (x)$$ $$(x_1, y_1) = ('Mohammad',-1)$$ $$F_5('Mohammad') = -1$$ $$(x_2, y_2) = ('Suzanne',+1)$$ $$F_5('Suzanne') = +1$$ $$(x_3, y_3) = ('Geordie',-1)$$ $$F_5('Geordie') = +1$$ $$(x_N, y_N) = ('Erin',+1)$$ $$F_5('Erin') = -1$$

This classifier got two right ('Mohammad' is actually 'male' and 'Suzanne' is actually 'female') and two wrong ('Geordie' is not 'female' and 'Erin' is not 'male').

Note that you can use as many weak classifiers as you like in your dictionary. For example it's straightforward to add to the above dictionary by adding weak classifiers that check for combinations of the last two letters in a name, the length of a name, the first letter of a name, and many other possibilities. You are limited here only by your creativity.

A related issue to consider is that the data you have might not be in the best format for building a good classifier. For example, we've assumed that there is enough structure in the letters used in a proper name to make a good guess at the name's gender. Whether this is in fact the case needs to be empirically tested. Perhaps there is a different way to store the data representing your objects - for example, it might be possible that the phonetic pronunciation of a name gives more clues as to its likely gender that its spelling. If this were the case, it might be better to store each of your names as sets of phonemes instead of letters, and build a dictionary of weak classifiers that look for phonetic properties instead of the presence or absence of specific letters.

We will see that using the very simple 'last letter' dictionary we built in Algorithm 2 is enough to build a classifier that performs at about 75% accuracy in labeling names in a typical test set. It is likely that you can do better than this, with some careful thought and insight in your choice of encoding of your data, and your choice of weak classifiers.

The drawbacks of hand-picking weak classifiers

One of the problems that is frequently encountered in machine learning approaches such as the one described here is that the ultimate performance of your classifier is bottlenecked by your ability to choose a good batch of weka classifiers for it to select from in the first place. Although we might be able to make clever guesses at what features might be important in language and image based problems (such as our assumption here that the last letter of a name might be important - as opposed to say the third or fourth letter), there are many cases where the data is not easily interpreted by a person (for example, in genetic sequence data). Picking good features in these cases can be incredibly difficult. Luckily, there are more sophisticated machine learning methods available that can also learn to automatically pick weak classifiers too without any human intervention. These methods, called Unsupervised Feature Learning techniques, will be encountered these in later tutorial installments.

1.4 - Wisdom of The Crowds: Strong Classifiers

A strong classifier is a function $$\Im(x)$$ that is formed by summing up the guesses of some subset of the full dictionary of weak classifiers, and performing a 'majority vote' on these guesses. The strong classifier we will use here looks like $$\Im(x) = sign [ \Sigma_{t=1}^T \bar{F_t}(x)]$$, where the strong classifier is formed by using a total of T weak classifiers $$\bar{F_t}(x)$$. While the notation may look a little confusing, the idea here is really simple - the set $$\{\bar{F_t}(x), t=1..T \}$$ is the subset of the entire dictionary $$\{F_k(x), k=1..D \}$$ that, when voting together, produces the best strong classifier. The voting is democratic, so the final verdict will be a +1 or a -1 either way.

We also have to include a rule for breaking ties - the label we assign if the sum in the sign function is zero (in our example gender classifier, we will assign 'female' (+1) in this case, as it's more likely that any generic name will be female than male because there are many more female names than male names).

Because the strong classifier concept is really the single most important idea for what follows here, it's worth going over this idea in some detail.

What we are looking for is the subset of our weak classifiers that, when acting together, provide the best classifier. To visualize what this means, let's think about the simple dictionary of weak classifiers we built in code snippets 3 and 4.

In this dictionary there are 14 weak classifiers. We could form our strong classifier to consider including everything from zero of these (in this case because of our tie breaking rule this strong classifier would label everything 'female') to all 14 of them acting together - see Figure 1 for one possibility.

Each of these combinations has some performance measure (we will see shortly what this is), and we want the combination with the best performance. Finding the best performing group of weak classifiers is an optimization problem. This problem is easy to write down, but hard to solve.

This is a very important concept at the heart of the QBC approach.

Important! Determining which weak classifiers to include in your final strong classifier is the core optimization problem we are going to solve in hardware. This is the key idea behind the QBC algorithm.

By the way, in case it isn't clear, here is what we mean by majority voting. Let's say we have a name we want to label with a gender (for example 'Ludwig'), and our potential strong classifier consists of three weak classifiers, for example "label male if the last letter is g else label female", "label female if the last letter is a else label male", and "label male if the last letter is n else label female". Applying these in order gives 'male', 'male' and 'female', so a majority vote of these three assigns the label 'male'. Our convention here is to associate the 'male' label with -1 and the 'female' label with +1, so equivalently we can do the majority voting by counting up these contributions, and if the resulting sum is negative label male else label female (in the above example this would be (-1) + (-1) + (+1) = -1 which is negative, therefore label 'male').

1.5 - The Loss Function

A loss function is a measure of how well a particular choice for a set of weak classifiers performs. Let's start by quantifying how we encode a particular combination of weak classifiers. Given a dictionary with D weak classifiers, we define a D-dimensional vector of binary numbers $$\vec{w}$$.

Figure 1. Each possible combination of our weak classifiers gives a possible strong classifier. In this figure there are 14 weak classifiers in our dictionary. One possible candidate for a strong classifier is comprised of weak classifiers 1, 6, and 13.

The elements of $$\vec{w}$$ that have value 1 represent weak classifiers that are 'turned on', and the elements that have value 0 represent weak classifiers that are 'turned off'. For the choice shown in Figure 2, the vector $$\vec{w}$$ would have zeroes for all elements except 1, 6 and 13 whose values would be 1. Here are three examples, for the 14 element dictionary defined in code snippet 2.

$\vec{w} = [0,0,0,0,0,0,0,0,0,0,0,0,0,0]$

This would represent the choice where no weak classifiers were included.

$\vec{w} = [1,1,1,1,1,1,1,1,1,1,1,1,1,1]$

This would represent the choice where all weak classifiers were included.

$\vec{w} = [0,1,0,0,0,0,1,0,0,0,0,0,0,1]$

This would represent the choice where weak classifiers 1, 6, and 14 were included, and all the others were turned off (the case in Figure 2).

For any item $$x_s$$ in our training set, each choice for $$\vec{w}$$ will produce a prediction for what the label should be. We can write this prediction as $\Im(x_s) = sign \left[ \sum^D_{j=1} w_j F_j(x_s) \right]$. Since all of our training data comes with labels $$y_s$$, we can compare this prediction to the actual label. There are several different ways we could do this. Here is one:

Zero-One Loss: The zero-one loss counts the number of mistakes a particular choice for $$\vec{w}$$ makes. The function: $L_0(\vec{w}) = \dfrac{1}{4} \sum^S_{s=1} \left( sign \left[\sum^D_{j=1} w_j F_j(x_s) \right] - y_s \right)^2$ adds the number of times the predicted label is different than the actual label for a choice of w. If it got every single member of the training set right, $$L_0(\vec{w})$$ will be zero. If it got them all wrong, $$L_0(\vec{w})$$ will be equal to S, the total number of elements in the training set.

1.6 - Regularization

When building a classifier, it is desirable to have as few weak classifiers included as possible - you want your strong classifier to be as simple as you can make it, while retaining some desired level of performance.

Overly complicated strong classifiers will tend to perform better on your validation set, but more poorly on your test set (and in the real world on examples it hasn't seen yet). This happens because of a phenomenon known as over-fitting - a strong classifier formed using a large number of weak classifiers can adjust itself to the idiosyncrasies of your training set, and not generalize as well to as yet unseen examples as a simpler model.

You can think of over-fitting as being analogous to "memorization" of the training set features, without "understanding" what it is about these features that is behind the classification. A student who crams for an exam at the last moment by memorizing a large number of specific facts may very do quite well on the exam, because they have the specific facts they need in that situation memorized, but will have trouble applying the underlying concepts to other situations.

Here we will use what is known as zero-norm regularization to fight against over-fitting. It penalizes the inclusion of many weak classifiers.

The regularization term we will use is $$R( \vec{w}) = \lambda \sum^D_{j=1} w_j$$, where $$\lambda$$ is a real number. This is a simple regularization which counts up the number of weak classifiers that have been included in the strong classifier (the number of 1's in the w vector) and adds this to the loss function.

So our full objective function is the loss plus the regularization term:

$G(\vec{w}) = \dfrac{1}{4} \sum^S_{s=1} \left( sign \left[\sum^D_{j=1} w_j F_j(x_s) \right] - y_s \right)^2 + \lambda \sum^D_{j=1} w_j$

We see that by adding this term, a strong classifier made up from fewer weak classifiers will help to minimize the objective function. Now that we have the two terms that make up the full objective function (loss + regularization) that we are trying to minimize, we can start incorporating this function into our code.

SECTION 2

2.1 - Using a call to the D-Wave compiler to solve the problem

If you are familiar with the Getting Started tutorial, you will recall that the way we use the dwave_sapi Python module to talk to the quantum computer is by writing a function which we send to the quantum compiler (affectionately known as the BlackBox compiler). In this case, it is the G(w) function that we derived in the previous section that we would like to minimize, given our training data. To make things simple, we'll first write the function to calculate the loss (error) separately, to make sure that we understand what it is doing, before we wrap this into the BlackBox notation. (We'll also use this function to check how our classifier performs on the test data later on) So following the equation given above for $$L_0$$, we can write the function as so:

Code Snippet 5: The loss function, which allows us to calculate the error on the training set

def calculate_training_error(training_data, dictionary, w):
summed_error_over_training_data = 0
for j in range(len(training_data)):
sum_over_classifiers = 0
for i in range(len(dictionary)):
classification = classify_a_name(dictionary[i], training_data[j][0])
weak_vote = w[i]*int(classification)
sum_over_classifiers += weak_vote
final_vote = cmp(sum_over_classifiers, 0)
if not final_vote:
final_vote = 1
vote_error = ((final_vote - int(training_data[j][1]))**2)
summed_error_over_training_data += vote_error
error = summed_error_over_training_data*0.25
return error


Now that we have the loss function, we can add the regularization term to it, and pop the whole objective function into a class to get it ready for consumption by BlackBox. I've highlighted the part that holds the objective function itself with some stars, which you should see looks very similar to our loss function above, just with the addition of the regularization sum(w)*3. The rest of the class deals with extracting single $$\vec{w}$$'s from a string of many concatenated guesses at $$\vec{w}$$.

Code Snippet 6: The full Objective function (loss + regularization) as a class definition

class ObjectiveFunction(object):
def __init__(self, training_data, dictionary):
self.training_data = training_data
self.dictionary = dictionary

def __call__(self, states, numStates):
states_bin  = [int((item+1)/2) for item in states] # converting to 0/1 from -1/+1
stateLen = int(len(states)/numStates)
ret = []
for state_number in range(numStates):
w = array(states_bin[state_number*stateLen:(state_number+1)*stateLen])
#**********************************************#
summed_error_over_training_data = 0
for j in range(len(self.training_data)):
sum_over_classifiers = 0
for i in range(len(dictionary)):
classification = classify_a_name(dictionary[i], training_data[j][0])
weak_vote = w[i]*int(classification)
sum_over_classifiers += weak_vote
final_vote = cmp(sum_over_classifiers, 0)
if not final_vote:
final_vote = 1
vote_error = ( (final_vote - int(training_data[j][1])) **2)
summed_error_over_training_data += vote_error
# Now here is G(w) = loss + regularization:
objective_function_value = summed_error_over_training_data*0.25 + sum(w)*3
#**********************************************#
ret.append(objective_function_value)
return tuple(ret)



Now that we have written our function, we can write our main routine, which grabs training data using the function we wrote earlier, then calls the compiler which attempts to minimize the objective function. In doing so it will 'train' our classifier (find the best w). The method for doing this is shown in code snippet 7 below. Again the format of this code should be familiar from the Getting Started tutorial.

Once we get the answer back from Blackbox, we turn it into a binary string (remember BlackBox deals in +1/-1 notation natively), print it out in two halves (remember from our classifiers list that the first half corresponds to the female features and the second half contains the male features), and then print out the classifiers that correspond to the bits which ended up as 1's. We can then use our handy error (loss) function from above to calculate the final error that the strong classifier makes on the test data.

Code Snippet 7: Main routine - training the classifier

from __future__ import division #Note that this import should go at the top of your Python file
from dwave_sapi import local_connection, BlackBoxSolver

training_data = get_training_data(100)
dictionary = classifiers

solver = local_connection.get_solver('c4-sw_sample')

blackbox_parameter = 1
num_vars = len(dictionary)
obj = ObjectiveFunction(training_data, dictionary)
strong_classifier = []

blackbox_solver = BlackBoxSolver(solver)
blackbox_answer = blackbox_solver.solve(obj, num_vars, cluster_num = 2,
min_iter_inner = blackbox_parameter,
max_iter_outer= blackbox_parameter,
unchanged_threshold=blackbox_parameter,
max_unchanged_objective_outer=blackbox_parameter,
max_unchanged_objective_inner = blackbox_parameter,
unchanged_best_threshold = blackbox_parameter, verbose=0)
blackbox_answer_bin = [int((item+1)/2) for item in blackbox_answer] # converting to 0/1 from -1/+1

print "The best bit string female was:",blackbox_answer_bin[0:7]
print "The best bit string male was", blackbox_answer_bin[7:14]
print 'The weak classifiers that make up the strong classifier are:'
for i in range(len(classifiers)):
print classifiers[i]
strong_classifier.append(classifiers[i])

error, total_classifications = calculate_training_error(training_data, dictionary, blackbox_answer)
print "The classifier accuracy on the training set is:", \
(total_classifications-error)*100/total_classifications, '%'



2.2 - Testing the classifier with the "test" data

The classifier accuracy on the training set doesn't give us much useful information. It tells us how well the classifier "memorized" the training data, but it doesn't tell us how well it can use its knowledge to make decisions given data that it has never seen before. For that we need some test data. Here is a set of data, again from NLTK, that was not included in the original training data. We'll use this to quiz our newly trained classifier.

Code Snippet 8: Our test data

# 100 male names.

test_names_male=\
['Agus','Akasha','Alfe','Aslin','Assasi','Autzen','Beckham','Caelen','Caillou','Daxen','Detroit',
'Domnic', 'Dresdin','Dunagan','Graeden','Guilluam','Hardock','Izaiack','Joneigh','Juton','Kahlen',
'Connor','Logan','Caleb','Noah','Alexander','Jackson','Brayden','Lucas','William','Nathan','Joseph',
'Justin','Daniel','Benjamin','Christopher','James','Gavin','Evan','Austin','Cameron','Brandon','Mason',
'Luke','Anthony','Christian','Gabriel','Owen','David','John','Jonathan','Samuel','Sean','Hunter',
'Elijah','Thomas']

# and 100 female names.

test_names_female=\
['Akeisha','Aliana','Antonela','Arhianna','Aveen','Azlynn','Blessita','Bluebelle','Chardonnay',
'Chartreuse','Clarencia','Cranberry','Cresenthia','Cynrae','Edessa','Emmalina','Emme','Evi',
'Freiya','Grelyn','Ileisha','Irelynn','Isela','Izis','Jaeliah','Jalyn','Jara','Jewellah',
'Keela','Mahrissa','Margaret','Moonbeam', 'Narjisa','Nikith','Racheline','Roshunda','Shaniqua',
'Samantha','Lauren','Mia','Alexis','Chloe','Ashley','Grace','Jessica','Elizabeth','Taylor','Makayla',
'Makenzie','Anna','Zoe','Kayla','Sydney','Megan','Natalie','Kylie','Rachel','Avery','Katherine',
'Isabel','Victoria','Morgan','Kyra','Jasmine','Allison','Savannah','Julia','Jordan']



In order to use our discovered classifier on the test data, we first write a function that uses the weak classifiers "in combination" to strongly classify a name, as shown in code snippet 9. This is the implementation of the wisdom of the crowds (democratic) voting system described earlier.

We next write a simple routine that takes each of the names in these lists of test data, and uses this strong classifier function to classify each name in turn. It starts with the male names. Everytime it guesses -1 it gets a point. We do the same for the batch of female names, but this time it must guess +1 to get a point. Adding up the points allows us to compute the overall accuracy on the test data:

Code Snippet 9: The main routine to calculate the classifier test error

def strong_classify_a_name(strong_classifier, name):
classification = 0
for classifier in strong_classifier:
classification += classify_a_name(classifier, name)
final_classification = cmp(classification,0)
if not final_classification:
final_classification = 1
return final_classification

correct = 0
perfect_score = 0
for name in test_names_male:
strong_classification = strong_classify_a_name(strong_classifier, name)
if strong_classification == -1:
correct += 1
perfect_score += 1
for name in test_names_female:
strong_classification = strong_classify_a_name(strong_classifier, name)
if strong_classification == 1:
correct += 1
perfect_score += 1

print '...'
print 'test data results: ', correct, 'out of ', perfect_score, 'correct'
print 'The classifier accuracy on the test set is:', correct/perfect_score*100, '%'



If everything is working properly, running all the code above this point should give an output that looks something like this:

Figure 2. Code output so far, including training of the classifier and running the final strong classifier on the test data.

2.3 - A simple GUI for offline mode classifier use

As a fun extension, we can write a simple GUI to finalize and augment our example. After all, no-one wants to interact with a command line to show off their application.

A good way of turning this into an interactive application is to allow the user to type in a name that they would like to have classify as male or female. So all we need to do is feed in the input from a text box into our strong classifier function instead of the test names (as in the previous code snippet). To create our GUI we'll therefore need an application window, a text box, images that denote the male and female symbols (both in a "highlihgted" version and a greyed out version), and a text box for the user input. wxPython is used here to provide a graphics framework in which to build the GUI window, and most the code in the following snippet handles the correct sizing of the window and the elements within the application window. . The most important part it that upon the clicking of the 'Enter' button (handled by the function on_enter) the code runs strong_classify_a_name on whatever bitstring the user entered into the textbox. The GUI is shown in Figure 3.

What is "offline" mode?

This GUI application also demonstrates an important point: There are ways of using quantum computers that do not require the quantum machines themselves to be accessible by your code at runtime of the application. Here the quantum processor is used to train the classifier. But once you have dicovered the optimal set of weak classifiers, applying that knowledge to a classification task is something that can be done completely conventionally. For this reason, such applications are said to run in "offline" mode. There are other applications of machine learning algorithms that DO require access to the quantum processor at runtime, and we'll see some examples of those later in the tutorial series.

Code Snippet 10: A simple gender classifier GUI

import wx

####### A panel that can be given a file path of an image and will display that image
class Image_Panel(wx.Panel):
def __init__(self, parent, id, pos):
wx.Panel.__init__(self, parent, id, pos, size=(120, 120))

picture_panel = wx.Panel(self, -1, (0, 0), (200, 225),  style=wx.SIMPLE_BORDER)
picture_panel.SetBackgroundColour(wx.WHITE)
self.picture = wx.StaticBitmap(picture_panel)

def set_picture(self, filename):
self.picture.SetFocus()
self.picture.SetBitmap(wx.Bitmap(filename))

##########################################################################################

####### A window holding the image and the controls
class Image_Frame(wx.Frame):
def __init__(self, parent, id, title):
self.strong_classifier = [['female', 'a'], ['female', 'e']]
wx.Frame.__init__(self, parent, id, title, wx.DefaultPosition, wx.Size(250, 400))
self.image1 = Image_Panel(self, -1, (55,10))
self.image1.set_picture('female_symbol_off.gif')
self.image2 = Image_Panel(self, -1, (55,150))
self.image2.set_picture('male_symbol_off.gif')

self.tc = wx.TextCtrl(self, -1, pos=(55, 280), size=(120, 25))
btn = wx.Button(self, 1, 'Enter', pos=(75,320), size=(80, 28))
self.Bind(wx.EVT_BUTTON, self.on_enter, id=1)

#when button pressed check the text ctrl, do something, then clear text.
def on_enter(self, event):
input = self.tc.GetValue()
gender = strong_classify_a_name(self.strong_classifier, input)
if gender == 1:
self.image1.set_picture('female_symbol.gif')
self.image2.set_picture('male_symbol_off.gif')
if gender == -1:
self.image1.set_picture('female_symbol_off.gif')
self.image2.set_picture('male_symbol.gif')
self.tc.Clear()

##########################################################################################

### Create an app
class Image_App(wx.App):
def OnInit(self):
frame = Image_Frame(None, -1, 'Gender Classifier')
frame.Show(True)
return True

app = Image_App(0)
app.MainLoop()


Figure 3. A simple GUI to try out your strong classifier in "offline" mode.

Extensions:

This is about the simplest possible binary classifier that one could imagine writing. As such, there are lots of potential extensions that are left as exercises for the reader! Here are just a few:

• Implementing better features than the 14 simple "last letter" features we used here
• Investigating what effect increasing the training set size and the number of features has upon the classifier accuracy
• Investigating how the runtime scaling of the code execution is affected by the size of the training set and the number of features
• Adding a cross-validation step into the algorithm, so that the 'constant' in the regularization term is replaced by a variable which is also optimized by training over the names in batches.
• Implementing a boosting approach which helps select classifiers in a more hierarchical and intelligent way.
• Using NLTK to consider the classification of sentences instead of single words. For example, you can imagine applying this approach to the classification of movie reviews as 'positive' or 'negative' by using keyword combinations as weak classifiers.