# MCS with molecules

Molecule matching using Maximum Common Subgraph (MCS) and D-Wave BlackBox

Contents

Section 1

• 1.1 - Problem Definition
• 1.2 - Quantifying the procedure
• 1.3 - Forming the conflict graph
• 1.4 - What the MIS of the conflict graph means
• 1.5 - Coding up our example
• 1.6 - Forming the conflict graph between two molecules
• 1.7 - Visualizing the output
• Aim, audience and required background

This material was developed to help those interested in the physics of the D-Wave OneTM System to perform some simple experiments and explore some of the interesting science behind the processor technology. This tutorial series is designed to introduce you to quantum computer programming using a practical, hands-on approach.

All of the software necessary for developers to begin learning how to program D-Wave quantum computing systems can be downloaded from the DOWNLOAD DEVKITS page. As the developer portal is currently in BETA test stage, you will need to have a D-Wave developer account to download these packs. To follow this tutorial it may be helpful to have read two of the previous installments: Quantum Computing and Energy Programming Primer and Getting Started: Installing the D-Wave developer pack and an introductory 'hello multiverse' tutorial.

The high level programming language used in the included examples and source code is Python 2.6. Some experience of using the Python programming language is helpful but not required.

What you will learn

By following through the material in this white paper, you will learn:

• What Maximum Common Subgraph (MCS) is and how it relates to Maximum Independent Set (MIS) problems
• How to form an MCS problem instance and solve it using D-Wave's BlackBox compiler
• How this allows you to search a database of molecules to find sub-graph matches and molecules with similar structural parts
• Software requirements

In order to proceed you should install several free software packages.

• Python - The material here assumes you are using Python 2.6. You can download Python from http://www.Python.org. If you need a programming development environment, Pycharm from Jetbrains http://www.jetbrains.com/pycharm/ is excellent. There is a small license fee to use it.

• NumPy - This is a scientific computing library. It is at http://numpy.scipy.org/

• PovRay: A raytracing program available for Windows and Linux. If you would like to visualize the molecules and their Maximum Common Subgraphs you will need to install this raytracing software. Note that this is only required for section 1.7 of the tutorial. PovRay can be found here: PovRay

• PyPov: A Python interface to generate Povray scripts. If you would like to visualize the molecules and their Maximum Common Subgraphs you will need to install this Python module. Note that this is only required for section 1.7 of the tutorial. PyPov can be found here: PyPov

• Code for this tutorial (optional - i.e. if you want to see the full code files for each section). You can find the code referenced in this tutorial here: Github Repository for molecule MCS code. You will need a github account to access this code. Once you have a github account you can email devportal at dwavesys dot com and we can add you to the github D-Wave repositories. The molecule information files you will need for this tutorial are also located in this repo.

• SECTION 1

1.1 - Problem Definition

Imagine you have two objects, and you'd like to know how similar they are, and where those similarities reside. These objects could be quite complex, with multiple internal components and relationships between these components.

As an example, imagine you have in your mind the perfect house you'd like to find on a real estate site. This perfect ideal might be described by a series of components that you'd like or need to have, some that you don't want, and relationships between these (you would be willing to accept not having a pool if the price was under a certain threshold). It would be very useful to be able to compare your ideal house to all of the houses in the real estate site's database, and rank all available houses as to their similarity to your ideal. Another similar example would be defining your ideal date, and obtaining a ranked list of matches to your ideal on a dating site.

Here we'll do a simple worked example where the two objects are molecules (see Figure 1), where we'd like to know how similar two molecules are. This is a useful thing to know, as molecules that share similar shapes often share similar function.

Figure 1. Left: acetaminophen (Tylenol). Middle: Aspirin. Right: Nicotine. You can think of a molecule as a labeled graph, where the vertices are the atoms, labeled with atom type (hydrogen , carbon, etc.) and coordinates in space (x,y,z), and the edges are the bond relationships between atoms (bond order, etc.). Our objective in this example is to find regions of maximal overlap between two molecules - we want to know how similar two molecules are, and what regions of the molecules share this similarity. You might guess that acetaminophen and aspirin might be more similar than either of these to nicotine - we'll see if this is true in this example!

The key idea here is that you first need to represent the objects in your application as graphs - sets of vertices and edges. The framework we'll describe here can handle both labeled and unlabeled graphs. Once you've done this, the next step is to define a similarity measure between two graphs. This is dependent on the context of your application, and defines what you mean by a 'good match' between two objects. In the case of the molecule matching example, we'll define a 'good match' by looking for regions where the positions and relationships of atoms in one molecule match the positions and relationships of atoms in the other molecule.

The mathematical problem we'll solve to do this is called Maximum Common Subgraph (MCS). It is also called subgraph isomorphism. It is a very hard pattern matching problem.

The approach we will take to formulating and solving MCS is to create a new graph out of the two graphs we want to match. This new graph is called a conflict graph. We then solve the MCS problem by solving a different problem, called Maximum Independent Set (MIS), on the conflict graph. We solve for the MIS using BlackBox (see the Weighted Maximum Independent Set tutorial to see how this works). Once we've done this, we then know how good the match is (the size of the MIS tells us how good the match is - the larger the answer, the larger the shared pattern), and the region of maximal overlap (this method also tells us where the two objects maximally match).

1.2 - Quantifying the procedure

Define the two graphs to be matched to be $$G_1 = ( V_1,E_1 )$$ and $$G_2 = ( V_2, E_2 )$$ where $$V_j$$ is a set of (in general labeled) vertices, and $$E_j$$ is a set of (again in general labeled) edges. Define the number of vertices in $$G_1$$ to be N and the number of vertices in $$G_2$$ to be M, and choose $$N \ge M$$.

1.3 - Forming the conflict graph

We form the conflict graph $$G_C = (V_C,E_C )$$ by adding up to $$N*M$$ vertices to $$G_C$$ in the following way.

Consider a pair of vertices, one drawn from the set $$\{V_1\}$$ (call it $$V_i$$ ) and one drawn from the set $$\{V_2\}$$ (call it $$V_i$$ ). We create a vertex $$V_{i,\alpha} \in G_C$$ if the label information for $$V_i$$ is compatible with the label information for $$V_{\alpha}$$. You can think of the vertex $$V_{i,\alpha}$$ as being a potential matching point between the two input graphs - we only want to create this vertex if the originating vertices $$V_i$$ and $$V_{\alpha}$$ are potential matches. In the case of the molecule matching example, the relevant vertex label is the atom type (carbon, hydrogen, etc.). We only create vertices in $$G_C$$ formed from pairs $$V_i$$ and $$V_{\alpha}$$ if these have the same atom type - we know that we can't have a solution where a carbon atom from $$G_1$$ is matched to an oxygen atom in $$G_2$$ so we don't include these in our conflict graph. The maximum number of vertices we can have in $$G_C$$ is $$N*M$$, with the actual number often being less than this depending on the nature of the requirement that vertices be compatible.

Next we add edges to $$G_C$$ between vertices for which there is a conflict. The way we do this is to draw edges between two vertices $$V_{i,\alpha}$$ and $$V_{j,\beta}$$ if any one of the following three conditions hold:

1.) $$i=j$$

2.) $$\alpha = \beta$$

3.) The conditions describing the pair of vertices $$V_i,V_j \in V_1$$ are in conflict with the conditions describing the pair of vertices $$V_{\alpha}, V_{\beta} \in V_2$$. What 'in conflict with' means is usually obvious for any particular situation. In our molecule example, a conflict arises if the atom types are mismatched, the bond types are mismatched, or the distances between the two atoms are outside of some threshold.

Once this is done, we have our conflict graph computed and we are ready for the next step. In terms of computational complexity, generating the conflict graph scales like $$O(NM)$$.

1.4 - What the MIS of the conflict graph means

The conflict graph has edges between all vertices that are 'conflicting' based on whatever our desired definition of conflict is. This means that we don't want a solution returned that contains vertices that are connected by an edge. This type of solution is called an Independent Set. The MIS is the largest set of vertices, no two of which are connected by an edge. The MIS of the conflict graph is the region of largest compatibility between the two original graphs.

Figure 2. Here is a simple example illustrating how the procedure works. Top left: $$G_1$$ with N=4 vertices. Top right: $$G_2$$ with N=3 vertices. Bottom right: $$G_C$$ with N*M=12 vertices (here we include all possible matches). The labeling of the vertices in $$G_C$$ is $$(i,\alpha)$$ where the first index points to the originating vertex in $$G_1$$ and the second index points to the originating vertex in $$G_2$$. The cyan circles in $$G_C$$ are the MIS, and the cyan circles in the top graphs are the corresponding locations of maximum overlap found from the MIS. The edges in $$G_C$$ are colored to distinguish the three conditions creating conflict described on the preceding page - red means the first index is the same $$(i=j)$$, green means the second index is the same $$( \alpha = \beta )$$, and dashed blue means there's a conflict (in this simple case, that just means that the edge conditions don't match - either there is an edge between $$i$$ & $$j$$ and not between $$\alpha$$ & $$\beta$$, or there is an edge between $$\alpha$$ & $$\beta$$ and not between $$i$$ & $$j$$. The colors are just for visualization and don't affect the computation of the MIS.

Once we solve for the MIS of the conflict graph, we are done. The size of the MIS is the number of vertices that match our desired conditions between $$G_1$$ and $$G_2$$ - the larger the MIS, the better the match. It is also easy to go back to the vertices in the original graphs to see where the overlap is. This is because if a vertex $$V_{i,\alpha}$$ is in the MIS, this means that the vertex $$V_i \in V_1$$ and the vertex $$V_{\alpha} \in V_2$$ are 'matched'. The subgraphs matching between $$G_1$$ and $$G_2$$ are the Maximum Common Subgraph of $$G_1$$ and $$G_2$$.

1.5 - Coding up our example

To see how this works in practice, let's code up the molecule example. In the case of the molecules, we will need to load some information first. We are going to use the networkx python module, available here, to store and manipulate the graphs in this example. Networkx is very useful for working with graphs, and will save you a lot of time in developing, manipulating and and visualizing graph-based representations of problems - it is highly recommended! Before we proceed, you should download and install it.

The .mol format for storing molecular information

For each molecule, we will need to know the atom types and locations, and information about the pairwise relationships between atoms in a molecule. In order to get this information, we will use information stored in the .mol standard - a reference for this is here. Here is an example of what a .mol file looks like. This one is for the aspirin molecule.

Figure 3. aspirin.mol file viewed in a text editor

The first line tells us how many atoms (21) and bonds (21) the molecule has. The next block (lines 2-22) give us information about the atoms - their locations (the first three columns) and their type (the fourth column). The next block (lines 23-43) provides information about the bond types between atoms. The codes are described in the document referenced above. As an example, the first line in the bond block is 1 2 1 0 0 0

This means "atom 1 (the first number) is connected to atom 2 (the second number) with a single bond (the third number)".

Here is a code snippet for loading a .mol file into a networkx graph.

Code Snippet:

############################################################################################################
#
# This function loads a .mol file (provided as input to the function in the string file_name) and returns
# a networkx graph. The vertices of the graph are labeled with ['positions'] which are the spatial
# coordinates of the atoms, and the edges are labeled with ['bond_types'] which are the bond types
# between the two atoms that the edge represents.

import networkx as nx

G = nx.Graph()

for k in range(N_v):
G.add_node(k, positions = [float(split_line[0]), float(split_line[1]), float(split_line[2])],
atom_type = str(split_line[3]))

for k in range(N_e):
bond_types = [int(split_line[2])]) # note: subtract 1 because of zero indexing!

return G

##############################################################################################################


1.6 - Forming the conflict graph between two molecules

First let's pick two molecules to match - let's take aspirin and nicotine to start. Since nicotine has more atoms (26 vs. 21) we choose G_1 to represent nicotine and G_2 to represent aspirin. This sets N=26 and M=21. The procedure for forming the conflict graph has two steps. The first is to create nodes. We create a node in the conflict graph, formed of a pair of vertices, one from G_1 and one from G_2, subject to a set of conditions that would be necessary for the pair of vertices to be considered a potential match. For example, the type of atom has to match - a hydrogen atom from the first molecule can't be a match to a carbon atom in molecule 2. We also require that the number and type of bonds originating from each atom match. Generally you want to put quite a bit of thought into this step, and defining all of the criteria for matching as well as you can, because you want the number of nodes created by this process to be as small as possible.

The second step, once all the nodes are created, is to add edges. We do this by first matching the edge conditions between the atoms in each molecule. For example, we add an edge between vertices in the conflict graph if there is a conflict between the edge conditions in the atoms in molecule 1 and the edge conditions in the atoms in molecule 2. Here the edge conditions refer to the presence / absence of an edge, the bond type, and comparisons of distance between atoms.

Here is a code snippet that creates a conflict graph, given two input molecules stored as networkx graphs.

Code Snippet:

############################################################################################################
#
# This function takes as input two networkx graphs and returns their conflict graph. The input graphs are
# assumed to come with the vertex labels ['atom_type'] and ['positions'], and edge labels ['bond_types'].
# The output graph has vertex label ['label'] which is a pair [k,j] where k is the vertex in G_1 and j is
# the vertex in G_2 whose pairing the vertex in G_C labeled by [k,j] represents. There are no edge labels
# in G_C.

import networkx as nx
from numpy import array

def form_conflict_graph(G_1, G_2):

N = G_1.number_of_nodes(); M = G_2.number_of_nodes(); G_C = nx.Graph()

# First we add vertices to the conflict graph.

cnt = 0
for k in range(N):
for j in range(M):
if G_1.node[k]['atom_type'] == G_2.node[j]['atom_type']: # atoms need to match
if len(G_1.edges(k)) == len(G_2.edges(j)): # order of vertices need to match
G_1_bond_list = []; G_2_bond_list = []
for edge in G_1.edges():
if edge[0]==k or edge[1]==k:
G_1_bond_list.append(G_1[edge[0]][edge[1]]['bond_types'])
for edge in G_2.edges():
if edge[0]==j or edge[1]==j:
G_2_bond_list.append(G_2[edge[0]][edge[1]]['bond_types'])
c=0
for i in range(len(G_1_bond_list)):
if G_1_bond_list[i] in G_2_bond_list:
c += 1
if c==len(G_1_bond_list): # all bond types need to match

# Here is where we add vertices to G_C. We add labels saying it was from [k,j],
# and the weight is 1.0 (we're going to use WMIS solver and we just want MIS).

G_C.add_node(cnt, label = [k,j], weight = 1.0)
cnt += 1

num_vertices = G_C.number_of_nodes()

for i in range(num_vertices):
for j in range(i):
G_1_first_vertex = G_C.node[j]['label'][0]; G_1_second_vertex = G_C.node[i]['label'][0]
G_2_first_vertex = G_C.node[j]['label'][1]; G_2_second_vertex = G_C.node[i]['label'][1]

distance_between_G_1_vertices = sum((array(G_1.node[G_1_first_vertex]['positions']) -
array(G_1.node[G_1_second_vertex]['positions']))**2)

distance_between_G_2_vertices = sum((array(G_2.node[G_2_first_vertex]['positions']) -
array(G_2.node[G_2_second_vertex]['positions']))**2)

edge_conflict = True

# for edge conditions to match, distances need to match within some tolerance
if abs(distance_between_G_1_vertices-distance_between_G_2_vertices) < 2.0:
b1 = G_1.has_edge(G_1_first_vertex, G_1_second_vertex)
b2 = G_2.has_edge(G_2_first_vertex, G_2_second_vertex)
# if there isn't an edge between vertices in both G_1 and G_2, there is no conflict
if not b1 and not b2:
edge_conflict = False
# if there are edges between both and the bond types match there is no conflict
if b1 and b2 and \
(G_2[G_2_first_vertex][G_2_second_vertex]['bond_types'] ==
G_1[G_1_first_vertex][G_1_second_vertex]['bond_types']):
edge_conflict = False

# Here is where we add edges to G_C.

if G_C.node[i]['label'][0] == G_C.node[j]['label'][0]:
if G_C.node[i]['label'][1] == G_C.node[j]['label'][1]:
if (G_C.node[i]['label'][0] != G_C.node[j]['label'][0]) and \
(G_C.node[i]['label'][1] != G_C.node[j]['label'][1]) and edge_conflict:

print "Number of vertices in conflict graph is", num_vertices
print "Number of edges in conflict graph is", G_C.number_of_edges()

return G_C

############################################################################################################


For completeness, note that the WMIS computation comes from the following code, introduced in the WMIS tutorial:

Code Snippet:

from copy import deepcopy
from dwave_sapi import local_connection, BlackBoxSolver
from numpy import dot, array, zeros, ones
import numpy
import time

##############################################################################################################
#
# These functions are for solving the optimization by exact enumeration.
#

def int2bin(nn, N):
"""returns the binary of integer nn, using N number of digits"""
return "".join([str((nn >> y) & 1) for y in range(N-1, -1, -1)])

def enumeration_solve_ising(h, J):

numvars=len(h[0])
w = -ones((1,numvars), dtype=numpy.int)

energy_list=[] # this is going to store all 2^K energies and w values.
for m in range(2**numvars): # cycle over all possibilities
bin_m=int2bin(m,numvars)
for mm in range(numvars):
w[0,mm]=2*int(bin_m[mm])-1 # value of w (+1/-1) for this value of m
result = dot(h[0], w[0])+dot(w[0],dot(J,w[0]))
energy_list.append([result,deepcopy(w)]) # add the result for this m.
sorted_energy_list = sorted(energy_list, key=lambda outp:outp[0]) # sort lowest..highest energy
return sorted_energy_list[0][1][0] # return value of w corresponding to lowest energy

##############################################################################################################

##############################################################################################################
#
# This class defines a function object for solving Ising models of the form G(s) = h.s + s.J.s using BlackBox.
# Here both h and J are numpy float arrays of size N and NxN respectively. Here is an example for N=3:
#
# h = [-0.9  0.2 -0.9]
# J = [[ 0.   1.1  0. ]
#      [ 0.   0.   1.1]
#      [ 0.   0.   0. ]]

class BB_solve_ising(object):

def __init__(self, h, J):
self.h = h
self.J = J

def __call__(self, states, numStates):

stateLen = len(states)/numStates
ret = []
for state_number in range(numStates):
w = array(states[state_number*stateLen:(state_number+1)*stateLen])
result = dot(self.h, w)+dot(w,dot(self.J,w))
ret.append(result)

return tuple(ret)
#
###############################################################################################################

###############################################################################################################
#
# This function computes the weighted maximum independent set of a graph G. The graph is assumed to be a
# networkx graph. solver_flag toggles between the use of an exact enumeration solver (solver_flag = 0) and
# D-Wave BlackBox (solver_flag = 1).

def weighted_maximum_independent_set(G, solver_flag):

solver = local_connection.get_solver("c4-sw_sample")

normJ = 1.1
blackbox_parameter = 1; cluster_number_parameter = 2 # needs to be even!!!!
number_of_vertices = G.number_of_nodes()
biggest_independent_set_found = []
h = zeros((1,number_of_vertices), dtype = numpy.float)
for i in range(number_of_vertices):
h[0,i] = (normJ*G.degree(i)-2*G.node[i]['weight'])

J = zeros((number_of_vertices, number_of_vertices), dtype = numpy.float)
for i in range(number_of_vertices):
for k in range(i):
if G.has_edge(i,k):
J[k,i]=normJ
if solver_flag:
obj = BB_solve_ising(h[0], J)
blackbox_solver = BlackBoxSolver(solver)
start_time = time.time()
answer = blackbox_solver.solve(obj, number_of_vertices, cluster_num = cluster_number_parameter,
min_iter_inner = blackbox_parameter, max_iter_outer= blackbox_parameter,
unchanged_threshold=blackbox_parameter, max_unchanged_objective_outer=blackbox_parameter,
max_unchanged_objective_inner = blackbox_parameter, unchanged_best_threshold = blackbox_parameter,
verbose=0)
time_to_return = time.time() - start_time
else:
start_time = time.time()
time_to_return = time.time() - start_time

for i in range(number_of_vertices):
biggest_independent_set_found.append(i)

return time_to_return, biggest_independent_set_found

###############################################################################################################


Now that we can load molecules and form conflict graphs, we can find the maximum common subgraphs of the molecules. First let's define a simple function that takes in the conflict graph and the MIS and outputs which vertices to 'light up' in the original molecules:

Code Snippet:

def vertices_to_light_up(G_C, MIS):

num_vertices = G_C.number_of_nodes()

list_of_vertices_from_molecule_1 = []; list_of_vertices_from_molecule_2 = []

for j in range(num_vertices):
first_label = G_C.node[j]['label'][0]; second_label = G_C.node[j]['label'][1]
if j in MIS:
list_of_vertices_from_molecule_1.append(first_label)
list_of_vertices_from_molecule_2.append(second_label)

return list_of_vertices_from_molecule_1, list_of_vertices_from_molecule_2



Putting everything together now allows us to write a short routine to calculate the MCS between two molecules of our choice. Here is a code snippet that implements this:

Code Snippet:

from WMIS_function import weighted_maximum_independent_set
from mcs_snippet_2 import form_conflict_graph
from mcs_snippet_3 import vertices_to_light_up

# Form conflict graph.

G_C = form_conflict_graph(G_1, G_2)

# Solve for the MIS.

use_blackbox = 1 # if you want to use blackbox, set this to 1, else set to zero for enumeration.
elapsed_time, MIS = weighted_maximum_independent_set(G_C, use_blackbox)

# Output vertices to light up

V1, V2 = vertices_to_light_up(G_C, MIS)

print "Vertices to light up in G_1 are", V1
print "Vertices to light up in G_1 are", V2



1.7 - Visualizing the output

2D graphs are not the best way to display molecules, we'd like some way of highlighting the MCS matches in a shiney, 3D graphic like in Figure 1. One way to do this is to produce molecule structures in Python, then raytrace the molecules to display them. Here we'll use the package PyPov by Simon Burton, (see here for Google Code project. Pypov is a beautiful little module which links PovRay (a raytracing software package) to Python. To run the following code you'll need to install PovRay and also PyPov. It can be a little confusing because the name of the PyPov import is Povray, so when you import Povray in the Python code below you are actually importing the PyPov module. The .pov file that this code generates is then what you open and run in the Povray program itself.

The functions atom_label_to_color() and bond_label_to_color() take the letters 'C', 'H' etc from the .mol file and turn these into colors that are indicative of the atom types. Here we chose fairly standard colors: Hydrogen = Red, Carbon = White, Nitrogen = Blue, Oxygen = Green.

The function load_file_values_into_array() reads in the .mol file and outputs lists of molecule positions, molecule colors, bond positions and bond colors. If an atom in a molecule is in the overlap list for that molecule instance, it is colored yellow to indicate that it is part of the MCS.

The function construct_mol_and_bond_objects() then takes these lists and converts them into a more concatenated format. It also handles the case of double bonds. If there is a double bond, it generates two separate bonds which are duplicates and just offset slightly from one another, and places both these objects in the final lists.

Next we just take our final lists of molecules and bonds and send them into the generate_pov_mols_and_bonds() function to convert our custom format into the format that PovRay will understand. write_to_pov_file() then writes all these structures, and some additional information, such as camera position and lighting, to a .pov file which is located in the same folder as this Python snippet below. To render the scene with the molecules, run PovRay and open the output .pov file from povray interface.

Code Snippet:

from Povray import *

def atom_label_to_color(atom_label, c_color, n_color, h_color, o_color):
color = "X"
if atom_label == "C":
color =  c_color
if atom_label == "N":
color = n_color
if atom_label == "H":
color = h_color
if atom_label =="O":
color = o_color
return color

def bond_label_to_color(bond_label, single_bond_color, double_bond_color):
color = "X"
if bond_label ==1:
color = single_bond_color
if bond_label ==2:
color = double_bond_color
return color

highlighted_mol_color, c_color, h_color, n_color, o_color):
mol_positions = []
mol_colors = []
bond_positions = []
bond_colors = []

for k in range(N_v):
mol_position = [float(split_line[0])+molecule_offset[0], \
float(split_line[1])+molecule_offset[1], \
float(split_line[2])+molecule_offset[2]]
mol_color = atom_label_to_color(str(split_line[3]), c_color, h_color, n_color, o_color)

if k in molecule_overlap:
mol_color[0:3] = highlighted_mol_color[0:3] # Checks for MCS flagged atoms, turn them yellow

mol_positions.append(mol_position)
mol_colors.append(mol_color)

for k in range(N_e):

bond_mol_start = int(split_line[0])-1
bond_position_start = mol_positions[bond_mol_start]
bond_mol_end = int(split_line[1])-1
bond_position_end = mol_positions[bond_mol_end]

bond_color = bond_label_to_color(int(split_line[2]), single_bond_color, double_bond_color)

bond_positions.append((bond_position_start, bond_position_end))
bond_colors.append(bond_color)

return mol_positions, mol_colors, bond_positions, bond_colors

def construct_mol_and_bond_objects(mol_positions, mol_colors, bond_positions, bond_colors, \
single_bond_color, double_bond_color):
# Add molecules and bonds to the list of structures. Molecules and single
# bonds are straightforward. If bond is a double bond, create the structure
# twice with an offset, and add both structures to the list of bonds.

#Form of the molecule structure that this function constructs: [(co-ord1, color1) , ... ] e.g.:
#[ [(x1,y1,z1),(r1, g1, b1, radius1)] ,
#  [(x2,y2,z2),(r2, g2, b2, radius2)] , ...]

#Form of the bond structure that this function constructs: [(co-ord_a1,co-ord_b1), color1) , ... ] e.g.:
#[ [((xa1,ya1,za1),(xb1,yb1,zb1)),(r1, g1, b1, radius1)] ,
#  [((xa2,ya2,za2),(xb2,yb2,zb2)),(r2, g2, b2, radius2)] , ...]

mol_coords_and_colors = []
bond_coords_and_colors = []

for k in range(len(mol_positions)):
mol_coords_and_colors.append((mol_positions[k], mol_colors[k]))

for k in range(len(bond_positions)):
if bond_colors[k] == single_bond_color:
bond_coords_and_colors.append(([bond_positions[k][0], bond_positions[k][1]], bond_colors[k]))

if bond_colors[k] == double_bond_color:
new_coord_start1 = [a - b for a,b in zip(bond_positions[k][0],doublebond_offset)]
new_coord_end1   = [a - b for a,b in zip(bond_positions[k][1], doublebond_offset)]
new_coord_start2 = [a + b for a,b in zip(bond_positions[k][0], doublebond_offset)]
new_coord_end2   = [a + b for a,b in zip(bond_positions[k][1], doublebond_offset)]

bond_coords_and_colors.append(([new_coord_start1, new_coord_end1],bond_colors[k]))
bond_coords_and_colors.append(([new_coord_start2, new_coord_end2],bond_colors[k]))

return mol_coords_and_colors, bond_coords_and_colors

def generate_pov_mols_and_bonds(mol_coords_and_colors, bond_coords_and_colors):
# Converts our molecule and bond structures into PovRay object format.
molecules = [ Sphere((x,y,z), c4, \
Pigment(color=(c1,c2,c3)), \
Finish(ambient = 0.2, \
diffuse = 0.2, \
reflection = 0.2, \
specular = 0.7)) \
for ((x,y,z),(c1,c2,c3,c4)) in mol_coords_and_colors ]

bonds = [ Cylinder(x,y,c4,
Pigment(color=(c1,c2,c3)),
Finish(ambient = 0.2))
for ((x,y),(c1,c2,c3,c4)) in bond_coords_and_colors ]
return molecules, bonds

def write_to_pov_file(mols_and_bonds):
cam = Camera(
location=(-5,-10,-30),
look_at=(-5,-10,0),
angle=0 )
light = LightSource((-5,-10,30),(1,1,1))
light2 = LightSource((-10,-20,-30),(1,1,1))

file2 = File( "molecules.pov" )
print "writing file"
file2.write(cam, light, light2)
for each in mols_and_bonds:
file2.write(each[0],each[1])

#######################################################################################
#####------------------------  MAIN ROUTINE   ----------------------------------#######
#######################################################################################

#Parameter definitions
doublebond_offset = [0.07,0.07,0.07]

single_bond_color = (0,1,1,0.05)
double_bond_color = (1,0,1,0.05)
highlighted_mol_color = [1,1,0]

c_color = [1,1,1,0.5]  # Standard Colours of the molecules in [ R, G, B , r]
h_color = [1,0,0,0.2] # The fourth element, r, is the sphere radius which is linked to
n_color = [0,0,1,0.5]  # atom type and therefore tags along for the ride in the colour list
o_color = [0,1,0,0.5]

# 3 sets of molecule pairs to compare all possibilities
files = ["aspirin.mol", "nicotine.mol", \
"nicotine.mol", "acetaminophen.mol", \
"acetaminophen.mol", "aspirin.mol"]

# Here we just put our G1 and G2 data from the previous code snippet to define the MCS for the above
# molecule pairs. Note you'll need to run the previous code 3 times to generate all 3 pairs of graphs.
overlaps = [[0, 1, 3, 4, 5, 13, 14, 16],[1, 2, 4, 5, 0, 13, 14, 12],\
[0, 1, 2, 4, 5, 13, 14, 15], [2, 3, 4, 0, 1, 13, 14, 11],\
[2, 1, 0, 5, 4, 3, 12, 11, 13],[0, 1, 2, 3, 4, 5, 14, 15, 16]]

offsets =[[0,0,0],[-20,-15,0],[-10,-25,0],[-10,-10,0],[0,-20,0],[-10,-20,0]]
# Below are the rough centre positions of the molecules in the mol files for reference, which was used to
# create the offsets list. Note the centre positions aren't all the same!
# aspirin1_centre = [0,0,0], acetaminophen2_centre = [0,0,0], nicotine1_centre = [10,15,0]

# Create the mol and bond structures:

mols_and_bonds = []
for i in range(len(files)):
mol_positions, mol_colors, bond_positions, bond_colors \
highlighted_mol_color, c_color, h_color, n_color, o_color)

mol_coords_and_colors, bond_coords_and_colors \
= construct_mol_and_bond_objects(mol_positions, mol_colors, \
bond_positions, bond_colors, \
single_bond_color, double_bond_color)

mols, bonds = (generate_pov_mols_and_bonds(mol_coords_and_colors, bond_coords_and_colors))
mols_and_bonds.append((mols,bonds))

# Create the pov file:
write_to_pov_file(mols_and_bonds)


Figure 4. Output of the Povray visualization code, showing the maximal overlapping regions for pairwise comparison of (top) nicotine and aspirin; (middle) acetaminophen and nicotine; and (bottom) aspirin and acetaminophen.