Quantitative Finance tutorial: Optimizing portfolios
Contents
Section 2 - Solving the problem
Aim, audience and required background
This material was developed to help those interested in programming the D-Wave One system to understand how to implement a simple application in the field of pure otimization. This tutorial series is designed to introduce you to quantum computer programming using a practical, hands-on approach. By the end of this tutorial you should understand the way that a real world optimization problem in the field of quantitative finance can be solved using the quantum hardware.
All of the software necessary for developers to begin learning how to program D-Wave quantum computing systems can be downloaded from the DOWNLOAD DEVKITS page. As the developer portal is currently in BETA test stage, you will need to have a D-Wave developer account to download these packs. To follow this tutorial it may be helpful to have read two of the previous installments: Quantum Computing and Energy Programming Primer and Getting Started: Installing the D-Wave developer pack and an introductory 'hello multiverse' tutorial.
The high level programming language used in the included examples and source code is Python 2.6. Some experience of using the Python programming language is helpful but not required.
What you will learn
By following through the material in this white paper, you will learn:
Software requirements
In order to proceed you should install several free software packages.
SECTION 1
1.1 - Introduction
Finding a diversified portfolio is a key strategy in quantitative finance. The general idea is to discover sets of stocks which are not correlated with one another over a certain period of time. Investing in stocks with low correlation is a way of minimizing financial risk. For more information on this topic, a good place to start is here: http://en.wikipedia.org/wiki/Modern_portfolio_theory
In this tutorial we'll work through a simple example of how to use quantum computing to help discover diverse portfolios by analysing real stock data. In a real world application of quantum computing to quantitative finance it would be necessary to consider many other factors, such as return on investment, and extrapolation of the trends into future time periods, but to keep the tutorial simple we'll just look at the diversification aspect.
1.2 - Getting our financial data
First we'll need to get some data to use for our application. For this tutorial the data used is one year's worth of day-by-day information from each of the stocks which form the Dow Jones Industrial Average index. The Dow Jones is a collection of 30 stocks which are used as an indicator of the health of the market as a whole.
You can download this data freely from the internet, for example from Google Finance: http://www.google.ca/finance
For this tutorial, each file has been saved as a .csv file and put in a folder, which resides in the directory containing the Python source code. Later we'll select a subset of these to use.
The .csv data format holds five rows: Open, high, low, close and volume. You can see them if you import a file into a spreadsheet, as below:
Figure 1. Orienting ourselves by observing the .csv data in a spreadsheet
We'll use the closing value as our main data source. Note that the data is in reverse date order!
1.3 - Importing the data
Make a new folder called 'a_few_dj_stocks' and copy across 17 of the .csv files from the Dow Jones folder. We'll explain what is special about the number 17 later in the tutorial. You can either pick them at random or use the same ones used here in order to quantitatively compare the results. The symbols for the 17 stocks picked for this tutorial were as follows:
'AA', 'AXP', 'CAT', 'CVX', 'DD', 'DIS', 'GE', 'HD', 'IBM', 'JPM', 'KFT', 'MCD', 'MMM', 'PG', 'UTX', 'WMT', 'XOM'
Now let's load the files into Python and extract the daily price change for our basket of 17 stocks.
import os
directory = './a_few_dj_stocks/'
list_of_files = os.listdir(directory)
number_stocks = len(list_of_files)
list_of_symbols = []
for each in list_of_files:
symbol = each.strip('data_')
symbol2 = symbol
symbol2 = symbol.strip('_2011.csv')
list_of_symbols.append(symbol2)
print ''
print 'initial portfolio = ', list_of_symbols
print ''
This should load in the symbols which we will use later to identify which stocks have been picked for our optimized portfolio. You should see an output something like:
initial portfolio = ['AA', 'AXP', 'CAT', 'CVX', 'DD', 'DIS', 'GE', 'HD', 'IBM', ? ]
1.4 - Extracting closing prices
Now we can write some code to grab the closing prices of the stock each week (which in this case is 5 days, remember markets are closed at the weekend)
import csv
all_closing_prices = []
for i in range(0,number_stocks):
closing_prices = []
row_counter = 0
prices = csv.reader(open(directory+list_of_files[i],'rb'), delimiter=',')
for row in prices:
if row_counter : # select all rows but the first one
closing_prices.append((float(row[4]))) # el 4 of the row is closing price, as in Fig ###.
row_counter+=1
closing_prices.reverse()
all_closing_prices.append(closing_prices)
You can add the following snippet inside the i loop if you want to check that the code is working properly so far:
if i==1:
print len(closing_prices)
print closing_prices
The length of the closing prices should be equal to the number of days in the year, here it is 255. The closing prices should show up as a list of floats, e.g.:
>>[48.409999999999997, 48.329999999999998, 50.090000000000003, ... ]Note that now we have reversed the prices list after we grabbed the elements, so the data is now in ascending date order.
1.5 - Creating a market graph
A graph is a mathematical object consisting of vertices (you can think of these as dots on a piece of paper) and edges (lines that connect some of the vertices). Here we are going to create a graph that tells us something about the correlations between stock prices. For background reading on this process, see http://www.math.kth.se/matstat/seminarier/reports/M-exjobb11/110208.pdf Section 2.7 and http://ise.tamu.edu/people/faculty/butenko/papers/COR_market.pdf Section 1.
First, we choose the vertices of the graph to be the stocks we are interested in. In this tutorial, this is the set of 17 stocks you selected in Section 1.3.
Next, we need to draw edges between these vertices in some useful way. To do this, we will draw edges between the vertices representing two stocks if the two stocks' price movements are correlated above some threshold. If we define the price of stock j at closing on day t to be \(P_j(t) \), then the return of the stock over the one day period from day t-1 to day t is
\[ R_i(t) = ln[P_i(t)/P_i(t-1)] \]We now define the correlation between stocks i and j to be
Where the brackets \( \lt \)\( \gt \) represent the average value of the quantity over some time period that we have freedom to select (how long we average over will be a parameter in what follows). For example, if we want to average the return of stock i over the first n days worth of data, the quantity \( \lt R_i(t) \gt = {{1}\over{n}} \sum_{k=1}^{n} R_i(k) \). Note that we only have to calculate \( Q_{ij} \) for \(i \gt j \).
Next, we define a real number K, which here we'll assume is in the range \( 0 \lt K \lt 1 \). We can now build the market graph! In order to do this, we draw an edge between vertices i and j if \( Q_{ij} \geq K \). Intuitively what we are doing is 'connecting' two stocks if their correlation is above our threshold. So two stocks that have an edge between them are correlated above our threshold, while two stocks that are not connected have correlation lower than our threshold. We can adjust K and see how this affects the return.
1.6 What the Maximum Independent Set of the market graph means
The Maximum Independent Set (MIS) of a graph is the largest collection of vertices where no two vertices in the MIS are connected by an edge. Finding the MIS of a graph is a hard problem. For the market graph, the decision version of MIS ('Is there an Independent Set smaller than some number?') is NP-complete, and the optimization version ('Find the MIS of the graph') is NP-hard.
Recall that when two vertices (stocks) are not connected in the market graph, what this means is that their correlation is under some threshold. Therefore the MIS is the largest subset of vertices (stocks) no two of which have correlation above your threshold - in other words, a maximally diversified portfolio!
1.7 - Implementing the theory
The next thing to do is to calculate the \(R_i \) values for each stock for each week as shown in the expression above. So we need to take the log of the stock price at timestep t divided by the stock price at time t-1.
from math import log
all_price_differences = []
for prices in all_closing_prices:
price_differences = []
for i in range(1,len(prices)):
price_differences.append(log(prices[i]/prices[i-1]))
all_price_differences.append(price_differences)
We now have a datastructure called all_prices_differences that holds the information we need to calculate Qi,j, so let's do that next. This is going to require a lot of loops, so bear with me!
from math import sqrt
from numpy import mean
timestep = 5
timeoffset = 0
matrix = []
for i in range(0,number_stocks):
matrix_row = []
for j in range(0,number_stocks):
if i<j:
Ris = []
Rjs = []
RiRjs = []
mean_subt_is = []
mean_subt_js = []
for k in range(0,timestep):
time = k+timeoffset
Ri = (all_price_differences[i][time])
Rj = (all_price_differences[j][time])
Ris.append(Ri)
Rjs.append(Rj)
RiRjs.append(Ri*Rj)
meanRi = mean(Ris)
meanRj = mean(Rjs)
for k in range(0,timestep):
mean_subt_is.append(Ris[k]**2-meanRi**2)
mean_subt_js.append(Rjs[k]**2-meanRj**2)
matrix_el = (mean(RiRjs)-mean(Ris)*mean(Rjs))/sqrt(mean(mean_subt_is)*mean(mean_subt_js))
else:
matrix_el = 0
matrix_row.append(matrix_el)
matrix.append(matrix_row)
for row in matrix:
print rowWhat does all this code do? It implements the expression for \(Q_{ij} \) given above. There's probably a function (like pearsonr in the statistics library) that will calculate this value for you, but I wanted to do it explicitly here to make sure that the expression was being evaluated correctly.
Each time around the loop it adds an element to a Python list-of-lists called matrix.
We have added a couple of parameters for flexibility. timestep is set to 5, to calculate the \( R_i \) values over 1 week of trading, but the user can set it to whatever value they want if they wish to average over different periods of time. Also, timeOffset is added because in the next stage we'll want to do the optimization for each week in the year, so we'll want to loop over this variable very soon.
Now that we have the matrix, we can compute the market graph. Shown in figure 2 is what the market graph looks like when we choose the parameters timestep = 5, timeoffset = 0, K = 0.95 (we chose the correlation threshold to be very high so that the graph would be sparse to help see what's going on).
Figure 2. The market graph for our 17 chosen stocks with edge threshold set to K=0.95
.
SECTION 2 - Solving the problem
2.1 - Solving the MIS problem on the market graph as an ISING problem
So far we have been talking about finding the Maximum Independent Set of the graph. But the quantum hardware doesn't calculate maximum independent sets, it minimizes energy. So we need a prescription which takes us from our market graph to a problem type that the quantum hardware understands. Remember from the previous tutorials that an energy program must be written as a set of \(h_i \) and \(J_{i,j} \) values. To create these h and J values from our current matrix we'll perform a mapping. This mapping is described and proved in http://arxiv.org/abs/1004.2226 section 3. To summarize the mapping:
Assume \( i \lt j \). If there is an edge between vertices \(i\) and \(j\) in the market graph, set \(J_{ij}=J\), else set \(J_{ij}=0\).
\(J\) is a parameter that can be any real number greater than 1 (we will set it here to be 1.1). If \(i\lt j\) set \(J_{ij}=0\).
The qubit bias values \(h_i\) are defined to be \(h_i = J d_i -2\), where \(d_i\) is the number of edges connected to vertex \(i\) in the market graph.
Now we know how to express MIS as an Ising problem we can write code to do this calculation with our real data.
First we'll create a structure that counts the number of edges connecting to each \(h_i\):
#create a structure for counting edges.
q_size = 17
q_matrix_list = []
threshold = 0.95
for i in range(0,q_size):
q_matrix_row = []
for j in range(0,q_size):
q_matrix_row.append(0)
q_matrix_list.append(q_matrix_row)
for i in range(0,q_size):
for j in range(0,q_size):
if i==j:
q_matrix_list[i][j] = 0
else:
if abs(matrix[i][j]) > threshold or abs(matrix[j][i]) > threshold:
q_matrix_list[i][j] = 1
else:
q_matrix_list[i][j] = 0
The first for loop just creates a list of the correct structure to hold the threshold-detected edges. We then run a second loop to place a 1 in the relevant connection between i and j if the corresponding correlation in matrix[i][j] is greater than the threshold. Note that we take the ABSOLUTE value of the correlation, as we don't want anticorrelations either in our diversified portfolio.
We then simply count the number of edges connected to each h_i as the number of detected edges in each sublist in this list:
number_edges = []
h_values=[]
normJ=1.1
for row in q_matrix_list:
number_edges.append(abs(sum(row)))
for i in range(0,len(number_edges)):
h_values.append(normJ*number_edges[i]-2.)
Now we have the \(h_i\) values, we can construct the J matrix as a dictionary. We'll just recalculate the J values within this loop instead of trying to import them from our previous datastructure.
J_dict= dict()
test_mat =numpy.zeros([17,17],dtype=int)
for i in range(0,q_size):
for j in range(0,q_size):
if i==j:
test_mat[i,j] = 0
else:
if abs(matrix[i][j]) > threshold:
test_mat[i,j]= 1
else:
test_mat[i,j] = 0
print test_mat
for i in range(0,q_size):
for j in range(0,q_size):
if test_mat[i,j]==1:
J_dict[(i,j)] = 1.*normJ
2.2 - Calling the quantum hardware
Now that we have constructed our ISING problem, we can embed it into the quantum processor. First we import the dwave SAPI library. We also need the embedding function K17_Q128. Now we see why 17 stocks were picked in the first place. We wanted to consider all possible correlations between stocks, but this involved a fully-connected 17-node graph. In order to solve a fully connected problem on the D-Wave processor (which does not natively have a fully connected architecture) we need to perform a process called embedding. Here we can embed 17 variables onto the 128 qubit processor. The function EmbeddingSolver will take care of embedding problems if you give it what is known as an embedding list. Here I have used K17_Q128, which we should also include in our code. The code for the embedding and ISING solving is given below. Notice that the part where the quantum processor is called is only a single line of code!
#solve the ISING problem
from dwave_sapi import local_connection, EmbeddingSolver
K17_Q128_EMBEDDING =
[[0, 4, 12, 20, 28],[1, 5, 13, 21, 29],[2, 6, 14, 22, 30],
[3, 7, 15, 23, 31],[8, 40, 44, 52, 60],[9, 41, 45, 53, 61],
[10, 42, 46, 54, 62],[11, 43, 47, 55, 63],[16, 48, 80, 84, 92],
[17, 49, 81, 85, 93],[18, 50, 82, 86, 94],[19, 51, 83, 87, 95],
[24, 56, 88, 120, 124],[25, 57, 89, 121, 125],[26, 58, 90, 122, 126],
[27, 59, 91, 123, 127],
[32, 33, 34, 35, 36, 37, 38, 39, 64, 68, 96, 100, 108, 112, 113, 114, 115, 116, 117, 118, 119]]
solver = local_connection.get_solver('c4-sw_optimize')
embedding_solver = EmbeddingSolver(solver, K17_Q128_EMBEDDING)
answer = embedding_solver.solve_ising(h_values, J_dict)
returned_bit_string = answer['solutions'][0]
print returned_bit_string
The returned bit string contains the MIS of the market graph, and therefore our optimized portfolio! The +1 elements of the returned answer are the stocks that are in the MIS; the -1 elements are not. We can now pick out the +1's in the bit string to compile a list of stock symbols in the final portfolio:
optimized_stock_symbols = []
for i in range(0,number_stocks):
if returned_bit_string[i]==1:
optimized_stock_symbols.append(list_of_symbols[i])
print 'Portfolio =', optimized_stock_symbols
Figure 3. An example of an optimized portfolio given the market graph shown in Figure 2.
The MIS is shown in purple and was calculated by optimizing over the whole year's worth of data.
2.3 - Optimizing over the whole year
We have now performed one optimization problem over a timestep of 5 days with a timeoffset of 0. (So our optimized portfolio is from week 1 of the 1 year period). Remember that our overall goal was to re-optimize the portfolio each week, so we will now do that.
We'll now need to create a loop that wraps around almost the entire code base we have written so far, in order to perform the optimization each week. To construct the loop we just need to replace our fixed variable 'timeoffset' with a loop variable, like this:
week = 0
timestep = 5
#timeoffset = 10
for timeoffset in range(0,250,timestep):
week+=1
.
.
.
Everything below the all_price_differences.append(price_differences) line should be wrapped in this for loop. This includes computing the market graph, and finding its MIS via solving the ISING problem.
I've added a variable 'week' which counts how many times the code has been around the loop, so we can include this in our printout section which is modified from the version above:
print 'Portfolio week ',week, ' = ', optimized_stock_symbols
Your printout should look something like this:
>>initial portfolio = ['AA', 'AXP', 'CAT', 'CVX', 'DD', 'DIS', 'GE', 'HD',
'IBM', 'JPM', 'KFT', 'MCD', 'MMM', 'PG', 'UTX', 'WMT', 'XOM']
>>Portfolio week 1 = ['KFT', 'MCD', 'MMM', 'XOM']
>>Portfolio week 2 = ['AXP', 'DD', 'GE', 'HD', 'MCD', 'MMM', 'PG', 'UTX']
>>Portfolio week 3 = ['PG', 'WMT']
>>Portfolio week 4 = ['IBM']
>>Portfolio week 5 = ['CAT', 'HD', 'MCD', 'PG', 'WMT']
>>Portfolio week 6 = ['GE', 'HD', 'KFT', 'MCD']
.
.
.