## pySTEP |

A *light* **Genetic Programming API** that allows the user to easily evolve populations of trees with *precise grammatical and structural constraints*. In other worlds you can set up building blocks and rules that define individuals during the evolution.

Why using rules and constraints ? Let's take the example of the DNA of a human being which is 95% similar to the DNA of a chimp. This means that there is some strong similarity in the building blocks and the structure of the genetic code. A completely random genetic code would make no sense on a functional practical point of view because it would give some kind of unfit and disfunctional goo... I just think that, to be useful, artificial evolution has to be able to integrate basic rules that determine the shape of the individuals generated, so that we can get practical results. This does not mean we eliminate random search and we still admit poorly fit individuals. But at least we focus the search in the direction of the problem by only generating potentially useful solutions. |

pySTEP is presently functional, running and stable. Still in the process of adding extra tutorials, extra features for reading the populations from the database, and modify some of the crossover code (so far, only 1-point crossover is used).

There has been a lot of improvement going on since the first version:

- The Koza Strongly-Typed flavoured build methods have been simplified and optimized (around 10 times faster for a code length reduced by five!)
- It is now possible to use a specific function-terminal sets for each of the children node. And these children node are built in the order they appear in the tree constraints (this feature only existed for ADF in the previous version). This makes pySTEP very competitive and gives it a serious advantage over existing Strongly typed GP packages :)
- It is now possible to specify what happens after the system has tried 100 times to produced rules-compliant offsprings using crossover but has failed (this might happens if we use a lot of constraining rules). Either we accept the unfit offsprings with Substitute_Mutation=0 or we substitute them with a mutated tree by setting Substitute_Mutation=1.
- The parameters of the tournament selection are integrated in the main function that calls the evolutionary run.
- Code for crossover and mutation is now simplified and clearer.

You can download the last version here.

copyright: (c) 2009 Mehdi Khoury under the mit license.

http://www.opensource.org/licenses/mit-license.html

This distribution comes with several pieces of documentation:

- the API documentation
- the README file
- a User Manual

In order to be able to use this package, you need to have installed on your machine:

- psyco for python 2.6

- pysqlite for python 2.6 and the corresponding version of SQlite

- numpy for python 2.6

A detailed html documentation explains how to set the parameters in the settings module and describes each and every of the 5 different tutorials (see tutorial1 to tutorial5 in the api documentation).

- Tutorial 1 is about a simple polynomial regression with one variable (we evolve an equation with operations on one x variable).
- Tutorial 2 is about a simple polynomial regression with two variables and Integers Random Constants (we evolve an equation with operations on an x and a y variables)
- Tutorial 3 is about evolving a forest of trees in a certain order. Each tree being a different polynomial, able to reuse previous trees. This is the equivalent of Strongly Typed ADF. (we evolve a system of interdependant equations)
- Tutorial 4 is like tutorial 3, but shows how to modify the system to run an evolution through 1000 different data points without having to evaluate a tree 1000 times but simply one time! (we plug an list of data as input to the tree, rather than evaluate the tree multiple times)
- Tutorial 5 shows how to evolve a hybrid system mixing discretes and continuous values, arithmetic and logical operators. (we evolve a system of equations mixed with if then else type of rules.e.g. if A then equation 1 else equation 2)

If you want to build a tree with very detailed and specific grammar rules, you need to know what Pystep expects you to do.

A tree always starts with a Root node at the top. The root node has no specific operator in it, but it is recognised by Pystep as the top of the tree.

You must specify for each parent node what are the possible children nodes. You must specify how many children nodes this parent node can have, and for each child node you must specify a function and a terminal set (if a parent can only have function nodes as children, specify the function set and an empty terminal set)... This looks all a bit confusing at the moment, so let's get through it with simple examples.

*Example 1: A tree with simple arithmetic and one result.*

**# default function set applicable for nodes:**

defaultFunctionSet= [(1,2,'+'),(1,2,'*'),(1,1,'^2'),(1,2,'-'),(1,1,'cos'),(1,1,'sin'),(1,1,'neg')]

**# default terminal set applicable for nodes:**

defaultTerminalSet= [(3,0,'x')]

treeRules = {

**# The root node only has one child. This child can be either a function node or a terminal node. **

'root':[(defaultFunctionSet,defaultTerminalSet)],

**# The + node has two children. Each child can be either a function node or a terminal node. **

'+':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The * node has two children. Each child can be either a function node or a terminal node. **

'*':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The ^2 node only has one child. This child can be either a function node or a terminal node. **

'^2':[(defaultFunctionSet,defaultTerminalSet)],

**# The - node has two children. Each child can be either a function node or a terminal node. **

'-':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The neg node only has one child. This child can be either a function of the type +, *, -, cos, and sin or a terminal node. **

'neg':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'cos'),(1,1,'sin')],defaultTerminalSet)],

**# The cos node only has one child. This child can be either a function of the type +, *, -, neg, and sin or a terminal node. **

'cos':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'sin'),(1,1,'neg')],defaultTerminalSet)],

**# The sin node only has one child. This child can be either a function of the type +, *, -, neg, and cos or a terminal node. **

'sin':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'cos'),(1,1,'neg')],defaultTerminalSet)]

}

*Example 2: A tree with hybrid systems: mixing discrete and contimuous representation (logic+arithmetic).*

**# default function set applicable by for branches:**

defaultFunctionSet= [(1,2,'+'),(1,2,'*'),(1,1,'^2'),(1,2,'-'),(1,1,'cos'),(1,1,'sin'),(1,1,'neg')]

**# default terminal set applicable by for branches:**

defaultTerminalSet= [(3,0,'x'),(3,0,'y')]

treeRules = {

**# The root node has three children in a specific order.**** The first child can only be a If function node. ****The second child can only be a Then function node. ****The third child can only be a Else function node.**

'root':[([(2,1,'if')],[]),([(2,1,'then')],[]),([(2,1,'else')],[])],

**# The If node has only one child. This child can be only a function node (either and, or, >, <, or =). **

'if':[([(1,2,'and'),(1,2,'or'),(1,2,'>'),(1,2,'<'),(1,2,'=')],[])],

**# The Then node has only one child. This child can be either a default function node or a default terminal node. **

'then':[(defaultFunctionSet,defaultTerminalSet)],

**# The Else node has only one child. This child can be either a default function node or a default terminal node. **

'else':[(defaultFunctionSet,defaultTerminalSet)],

**# The And node has two children. Each child can be only a function node (either and, or, >, <, or =). **

'and':[([(1,2,'and'),(1,2,'or'),(1,2,'>'),(1,2,'<'),(1,2,'=')],[]),([(1,2,'and'),(1,2,'or'),(1,2,'>'),(1,2,'<'),(1,2,'=')],[])],

**# The Or node has two children. Each child can be only a function node (either and, or, >, <, or =). **

'or':[([(1,2,'and'),(1,2,'or'),(1,2,'>'),(1,2,'<'),(1,2,'=')],[]),([(1,2,'and'),(1,2,'or'),(1,2,'>'),(1,2,'<'),(1,2,'=')],[])],

**# The = node has two children. Each child can be either a default function node or a default terminal node. **

'=':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The > node has two children. Each child can be either a default function node or a default terminal node. **

'>':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The < node has two children. Each child can be either a default function node or a default terminal node. **

'<':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The + node has two children. Each child can be either a default function node or a default terminal node. **

'+':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The * node has two children. Each child can be either a default function node or a default terminal node. **

'*':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The ^2 node has only one child. This child can be either a default function node or a default terminal node. **

'^2':[(defaultFunctionSet,defaultTerminalSet)],

**# The - node has two children. Each child can be either a default function node or a default terminal node. **

'-':[(defaultFunctionSet,defaultTerminalSet),(defaultFunctionSet,defaultTerminalSet)],

**# The neg node only has one child. This child can be either a function of the type +, *, -, cos, and sin or a default terminal node. **

'neg':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'cos'),(1,1,'sin')],defaultTerminalSet)],

**# The cos node only has one child. This child can be either a function of the type +, *, -, neg, and sin or a default terminal node. **

'cos':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'sin'),(1,1,'neg')],defaultTerminalSet)],

**# The sin node only has one child. This child can be either a function of the type +, *, -, neg, and cos or a default terminal node. **

'sin':[([(1,2,'+'),(1,2,'*'),(1,2,'-'),(1,1,'cos'),(1,1,'neg')],defaultTerminalSet)]

}

To execute the evolution run of the first tutorial, create a main method in Python (there is one main module already made for you in the package if you want...).

Inside, you need to:

1- import the module called "evolver"

2- define a name and path for you database.

3- and then call the EvolutionRun method from the evolver module.

e.g. of a main file:

import evolver

if __name__ == "__main__":

dbname=r'C:\pop_db'

evolver.EvolutionRun(2000,(0,1,'root'),2,8,'AddHalfNode',100, 0.001 ,0.5,0.49,dbname,True)

This means that we define:

- a population size of 2000 individuals,
- a root node with 1 child
- a minimum tree depth of 2
- a maximum tree depth of 8
- the use of a Ramped half and Half Koza tree building method
- a maximum number of runs of 100 generations before we stop
- a stoping fitness criteria of 0.001 (if the fitness<=0.001, solution found)
- a crossover probability of 0.5
- a mutation probability of 0.49
- a reproduction probability of 0.01
- a database of name and path dbname
- a run done in verbose mode (printing the fittest element of each generation)

The run will create a database file containing tables of names tab0, tab1,... where every table contains the population for the corresponding generation.

For those who want details of the populations, there are methods in the module writepop that allow to:

- get stats from a database table: GetPopStatFromDB(dbname,tablename)

- print the population from a database table to a file: PrintPopFromDB(dbname,tablename,filename)

On a more personal note, I enjoyed programming this API using the Eclipse IDE (there is a pyDev module that allows to use Eclipse to program in Python).

It's great and I recommend it !

Feedback and input are highly welcomed :)

ECJ : Java Genetic Programming Package

pyGP : The Python Genetic Programming Project implements a Genetic Programming System a la J Koza in Python.

Dione : Genetic Programming framework written in Python. Takes advantage of python\'s compiler to make things simple. Includes basic genetic operations (rank/roulette selection,crossover,mutation,steady state,elitistm ...)