pySTEP User Manual 1.0

Python Strongly Typed gEnetic Programming

3d rendered picture of an evolutionary lego man

Introduction

pySTEP is a light Genetic Programming API that allows the user to easily evolve populations of trees with precise grammatical and structural constraints.

Problem: Lack of simplicity and flexibility of existing Genetic Programming APIs when using Strongly-Typed and Grammar based structures.

The goal is: create a simple and flexible Strongly-Typed Genetic Programming API using a simple and flexible programming language : Python.

The goal is NOT : a super-optimized, hyper-performant marvel of software engineering that computes huge populations using obfuscated code. We will be happy to play with relatively small populations of let say 10000 individuals maximum. What we want is not raw power, but rather control on how we investigate the search space by establishing elaborate constraints and structures defining the shapes of the trees generated.

This would be a package that allows you to evolve any kind of weird graph, object, hierachical and complex composite structure in general, given this thing can be encoded inside a tree structure.

In Python, the following tree could be represented by a nested list:

tree data structure

Nested list representation (Assumption: First element of a list or a nested list is always a head node) :

individual = ['A', ['B', ['D', 'H', 'I'], 'E'], ['C','F', 'G']]

This simple representation is the starting point of the project.

In our package, we try to follow some consistent notation to identify tree nodes. A node object is a tuple composed of three elements e.g. (0,2,'root')

The first element represent the type of node:
- 0 Root Branch
- 1 Function Branch
- 2 ADF Defining Branch
- 3 Variable Leaf
- 4 Constant Leaf
- 5 ADF Leaf
The second element is the arity of the Node (number of children).
If the number of children >0 then it is a branch node. If arity = 0
then it is a leaf node... e.g. (0,2,'root') means the root node has two children.
The third element is the name or unique identifier of the Node.
e.g (1,2,'+') is a function branch node with two children and unique identifier '+'
while (1,2,'*') is a function branch node with two children and unique identifier '*'

So, following this way of coding nodes, a basic equation like x * (x^2 + x)could be represented by:

[(0, 1, 'root'), [(1, 2, '*'), (3, 0, 'x'), [(1, 2, '+'), [(1, 1, '^2'), (3, 0, 'x')], (3, 0, 'x')]]]

pySTEP how to

To evolve a population of trees in order to solve a particular problem, you need to:

Step 1 - create a new parameter file.

Step 2 - build trees

Step 3 - specify a fitness function

Step 4- evolve a population of trees

Step 1 - create a new parameter file.

You need to create a new .py file that will contain all the parameters. e.g. create a 'my_seetings.py' blank file. At the top of the file, import whatever modules you need. e.g.

import random

import math

psyco.profile() # will optimize the binary code using the psyco interpreter

Then, start writing down the 8 parameters.

Parameter 1: The function set

We need to define branch nodes and map functions to them to perform operations
such as arithmetic addition, subtraction, multiplication... Whatever operation is
needed in the function set of a tree should be created here.
e.g.

Functions like add return the result of the addition of two children nodes, or some exception if something goes wrong. e.g.

def add(listElem):

try:

return listElem[0]+listElem[1]

except:

raise WrongValues, "Wrong values sent to function node.\nCan't get result"

exit

(see complete function listing in the sourcecode of tutorial1)

Then we can define the mapping to the whole function set:

functions = {

'+':add,

'-':sub,

'neg':neg,

'*':multiply,

'^2':square,

'cos':cos,

'sin':sin,

'root':rootBranch }

Parameter 2: The terminal set

We define the terminal nodes, and map them with corresponding methods. e.g.

nb_eval=2

all_x=[]

for i in xrange(nb_eval):

all_x.append(random.random()*10)

terminals = { 'x':all_x}

Parameter 3: The number of data points evaluated

We define the number of examples we learn from. It has just been used to define the terminals in the previous piece of code:

nb_eval=2

Parameter 4: The learning data

The results we try to approach. They are defined by some 'ideal' function, or by genuine data taken from some experiment. e.g.

ideal_results=[]

def GetIdealResultsData():

for nb in xrange(nb_eval):

ideal_results.append([all_x[nb]**3+all_x[nb]**2+math.cos(all_x[nb])])

return ideal_results

Parameter 5: The mapping of function branches in case of crossover

We also define what function branches can be changed during crossover, to make an a fragment compliant with a new parent tree. e.g

crossover_mapping=[('+','adf2_+'),('adf2_+','+'),('*','adf2_*'),('adf2_*','*')]

This example means that if a branch of the fragment tree has a '+' branch, this branch can be replaced by an 'adf2_+' branch

to make the fragment compatible to the new parent tree.

In our case, we don't need this advanced feature at the moment, so we just write:

crossover_mapping=[]

Parameter 6: The degree of control on the tree constraints

To say if we we control the compliance of the offspring and trees built to a set a rules. Can be set to 1 if such control is needed. So far we will use the value 0.

Strongly_Typed_Crossover_degree=0

We also specify what happens after the system has tried 100 times to produced a rules-compliant tree using crossover but has failed.

Either we accept the unfit offsprings with Substitute_Mutation=0 or we substitute them with a mutated tree by setting Substitute_Mutation=1.

In our case, the rules are very simple, so we can set :

Substitute_Mutation=0

Parameter 7: The rules that determine the shape of a tree

Every child of every node in the tree has a list of possible function nodes and terminal nodes. A given node will be described as a list of tuples where each tuple contains the function and terminal set for a child. e.g. a '+' node will have two children. Each child will use a default function and terminal set.So a '+' node will be described by [(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)].

One complete example of implementation:

defaultFunctionSet= [(1,2,'+'),(1,2,'*'),(1,1,'^2'),(1,2,'-'),(1,1,'cos'),(1,1,'sin'),(1,1,'neg')]
# default terminal set applicable by for branches:
defaultTerminalSet= [(3,0,'x')]
treeRules = {'root':[(defaultFunctionSet,defaultTerminalSet)],
'+':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],
'*':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],
'^2':[(defaultFunctionSet,defaultTerminalSet)],
'-':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],
'neg':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'cos'),(1,1,'sin')],defaultTerminalSet)],
'cos':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'sin'),(1,1,'neg')],defaultTerminalSet)],
'sin':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'cos'),(1,1,'neg')],defaultTerminalSet)]
}

Parameter 8: If the order of the ADF branches matters

There are no ADF branches in this example, so it does not matter. e.g.

adfOrdered = False

Then you need to tell the system that it will have to use this file for settings. In order to do that, go to the module called 'settings.py'. Then you need to declare all the 10 different parameters as being located in 'my_seetings.py'. e.g.

# setup for running tutorial 1
functions = my_seetings.functions
terminals =my_seetings.terminals
crossover_mapping=my_seetings.crossover_mapping
nb_eval=my_seetings.nb_eval
ideal_results=my_seetings.GetIdealResultsData()
Strongly_Typed_Crossover_degree=my_seetings.Strongly_Typed_Crossover_degree
Substitute_Mutation=my_seetings.Substitute_Mutation
treeRules = my_seetings.treeRules
adfOrdered = my_seetings.adfOrdered
FitnessFunction = my_seetings.FitnessFunction

The settings.py module contains extensive comments explaining these parameters in greater details. If you feel that the following explanation is not sufficient, read the api documention for settings (it contains detailed information) .

Step 2 - build trees

You generally need to test if your grammar rules are correct, and if they allow you to produce trees without weird errors.

To do that little test, go to the buildtree.py module. There is a main method at the end that build a tree for you.

if __name__ == '__main__':

a=buildTree().AddHalfNode((0,3,'root'),0,2,8)

print a

Just run it several times to see if it works properly... Otherwise, make sure your set of rules is consistant...

Step 3- Define a fitness function

In the module my_seetings.py , we define the fitness function. It generally reuse functions from the module evalfitness.

e.g.

def FitnessFunction(my_tree):
return evalfitness.FinalFitness(evalfitness.EvalTreeForAllInputSets(my_tree,xrange(nb_eval)))

use two functions :

EvalTreeForAllInputSets: parse the tree by plugging input values and computing the result obtained at the top of the tree.

FinalFitness: returns the overall fitness of a tree over a set of datapoints using the result of the previous function as an input. It contains problem specific aspects of the fitness calculation.

Step 4- Start Evolution...

To execute the evolution run of the first tutorial, create a main method in Python (there is one already made for you in the package if you want...).
Inside, you need to:
1- import the module called "evolver"
2- define a name and path for you database.
3- and then call the EvolutionRun method from the evolver module.
e.g. of a main file:

import evolver

if __name__ == "__main__":

dbname=r'C:\pop_db'
evolver.EvolutionRun(2000,(0,5,'root'),3,8,'AddHalfNode',100, 0.00001 ,0.5,0.49,7,0.8,dbname,True)

This means that we define:

a population size of 2000 individuals,
a root node with 1 child
a minimum tree depth of 2
a maximum tree depth of 8
the use of a Ramped half and Half Koza tree building method
a maximum number of runs of 100 generations before we stop
a stoping fitness criteria of 0.1 (if the fitness<=0.1, solution found)
a crossover probability of 0.5
a mutation probability of 0.49
a reproduction probability of 0.01 (automatically deduced from the 2 previous values)
the size of the tournament selection
the probability of selecting the fitttest during tournament selection
a database of name and path dbname
a run done in verbose mode (printing the fittest element of each generation)

The run will create a database file containing tables of names tab0, tab1,... where every table contains the population for the corresponding generation.

To see the system in action with more interesting examples, look at the tutorials 1 to 5...

Enjoy playing with evolution !