Sunday, February 19, 2012

A Bayesian Approach To Kalman Filtering

INTRODUCTION
Ripped straight from Wikipedia...
The Kalman filter is an algorithm, commonly used since the 1960s for improving vehicle navigation (among other applications, although aerospace is typical), that yields an optimized estimate of the system's state (e.g. position and velocity). The algorithm works recursively in real time on streams of noisy input observation data (typically, sensor measurements) and filters out errors using a least-squares curve-fit optimized with a mathematical prediction of the future state generated through a modeling of the system's physical characteristics.


TL;DR
Rockets


There are both linear and non-linear forms of the Kalman filter, with the non-linear forms being the Extended Kalman Filter (EKF), the invariant Extended Kalman Filter, and the Unscented Kalman Filter (UKF). The non-linear versions of this algorithm represent the current "gold standard" in many application areas, including GPS and navigation.

Before jumping in the deep end of the pool, I decided to implement a simple example that shows the ideas and implementation of Kalman filtering, using a recursive Bayesian approach.


TL;DR
Homework


WEIGHTING FUNCTION FOR KALMAN UPDATING
The Kalman filter is often derived from a matrix equation standpoint. The math, at least to me, is long, involved, and fairly nasty to solve without the help of some MATLAB matrix witchery. However, there is also a second, more "gut level" way to approach the Kalman filter - by approaching it as a case of recursive Bayesian filtering.

The derivation for the following equations can be found on pg. 47 of Bayesian Data Analysis, Second Edition, by Gelman, Carlin, Stern, and Rubin.

Basically, it is possible estimate the mean of a normal distribution by following a weighting equation for mean and variance, which can be represented as follows.

$\Large{\mu_1=\frac{\frac{1}{\tau_0^2}\mu_0+\frac{1}{\sigma^2}y}{\frac{1}{\tau_0^2}+\frac{1}{\sigma^2}}}$

$\Large{\frac{1}{\tau_1^2}=\frac{1}{\tau_0^2}+\frac{1}{\sigma^2}}$

Here, $\mu_1$ represents the new estimate for the mean and $\tau_1^2$ is the estimate for the new variance. $\mu_0$ represents the old mean, $\tau_0^2$ represents the old variance, $y$ is the new measured value, and $\sigma^2$ is the measurement variance. I will do an example later on to demonstrate this, but first, let's try to decide what this equation is really saying.

If the old value for variance, $\tau_0^2$, is very large, then $u_0$ is multiplied by a very small number, effectively making the equation...

$\Large{\mu_1 = \frac{\frac{1}{\sigma^2}y}{\frac{1}{\sigma^2}}}$

which can be further reduced to simply...

$\Large{\mu_1 = y}$

What this means, is that if we are not very confident ($\tau_0^2$ is large) of our old mean, $\mu_0$, our best guess for $\mu_1$ is simply the value we measured, $y$. Our new variance also reduces to

$\Large{\frac{1}{\tau_1^2}=\frac{1}{\sigma^2}}$

Conversely, if we are not very confident in the new result $y$, our new mean $\mu_1$ is basically the same as $\mu_0$. The new mean is based largely on the weighting between the new variance and the old - so if we are equally confident in both the previous mean and the new measurement, our best estimate for $\mu_1$ will split the difference.


Using this update function, it is easy to wrap a new estimate back into the prior distribution, which is the core idea behind Bayesian analysis. But how does it come into play with Kalman filtering?

APPLICATION
The Kalman filter can be boiled down to a few basic steps.
1. Predict a future value, based on previous information. (Predict)
2. Compare prediction and measured value, using results as the new previous information. (Update)
3. If you run out of data, or are satisfied with the estimate, exit. Otherwise, GOTO 1.

Pretty simple, right? This method can yield some very interesting results, cutting through measurement error and giving close approximation of the root signal.

From a high level, Bayesian derivation is

$P(X_i|Y_{1:i})    \alpha     P(Y_i|X_i,Y_{1:i-1}) P(X_i|Y_{1:i-1})$

Because Bayesian analysis usually wraps it's results back into the prior model,
this reduces to

$P(X_i|Y_{1:i})     \alpha     P(Y_i|X_i) P(X_i|Y_{1:i-1})$

What this means is we have a predictive function, $P(Y_i|X_i)$, which will be described a bit later, used in conjunction with a likelihood function, $P(X_i|Y_{1:i-1})$, which was described above. Together these two functions form the core of the Kalman filtering operation: Predict, Measure, Update, Repeat.

SETUP
The key things we need to know are:
The standard deviation of our source. I chose .6
The scale factor of the source. This can be calculated from
        $1 = (scale  factor)^2+(source  std  dev)^2$
The standard deviation of the measurements. I chose .5, but this is a parameter you can play around with. The higher the measurement sigma gets, the worse our filter performs, due to signal to noise ratio effects. An intial deviation of .5 is fairly high, so dropping this number to something like .25 will give "prettier" results.

In a real system, you could reasonably expect to have tolerances for both your source equipment and your measurement equipment. Maybe you could make measurements of this, or it could be found in the manufacturer's datasheet.

I will be showing this method through an example, using R. We will create a test case by following a few generic steps:
1. Create a source signal with some noise variance $X$.
2. Create a measurement signal, which will introduce measurement variance $Y$.
3. Kalman filter the measurement signal $k = filt(Y)$.
4. Compare the filtered data $k$ with the original transmission $X$.

So let's begin.



Step 1.
Create a source signal. I chose to generate a random walk.

randomwalk <- function(num, source_scale, source_sigma) {
    randy <- c(0,1:num)
    for(i in 2:length(randy)) {
        randy[i] <- source_scale*randy[i-1]+rnorm(1, sd=source_sigma)
    }
    return(randy[2:length(randy)])
}




Step 2.
Now, we need a function to create a measurement signal, which will further distort our input.

received <- function(sent, meas_sigma) {
    measured <- sent+rnorm(length(sent), sd=meas_sigma)
    return(measured)
}



Step 3.
The signal is pretty badly distorted now... time to write a Kalman filter function.

pred_mean <- function(source_scale, prev_mean) {
    return(source_scale*prev_mean)
}

pred_sigma <- function(source_scale, prev_sigma, source_sigma) {
    return(sqrt((source_scale**2)*(prev_sigma**2)+source_sigma**2))
}

update_mean <- function(pred_mean, pred_sigma, meas_val, meas_sigma) {
    numerator <- (pred_mean/(pred_sigma**2))+(meas_val/(meas_sigma**2))
    denominator <- (1/(pred_sigma**2))+(1/(meas_sigma**2))
    return(numerator/denominator)
}

update_sigma <- function(pred_sigma, meas_sigma) {
    r =(1/(pred_sigma**2))+(1/(meas_sigma**2))
    return(1/sqrt(r))
}


filt <- function(y, source_scale, source_sigma, meas_sigma) {
   last_mean <- 0
   last_sigma <- source_sigma
   k <- 1:length(y)
   for(i in 1:length(y)) {
       est_mean <- pred_mean(source_scale, last_mean)
       est_sigma <- pred_sigma(source_scale, last_sigma, source_sigma)
       k[i] <- est_mean+rnorm(1, sd=est_sigma)
       last_mean <- update_mean(est_mean, est_sigma, y[i], meas_sigma)
       last_sigma <- update_sigma(est_sigma, meas_sigma)
   }
   return(k)
}

A few quick derivations are required for this code to make sense. The update_mean and the update_variance functions were described at the start of the blog, but where on earth did the pred_mean and pred_sigma functions come from?
According to notation of the Kalman filter, we currently have
$X_i = \alpha_s X_{prev}+\omega_s$, where $\alpha_s$ is our source scale, and $\omega_s$ is the source standard deviation. In Kalman filter terms, this equation is the state model, and we got this from the knowledge we have about how the randomwalk values were generated.
The second equation, our space model, is $Y_i = X_i + \omega_m$, where $\omega_m$ is the measurement standard deviation. 
In order to get the equations for our pred_mean and pred_sigma we want to find the expected value and variance of the state model, which looks like this.
$E[X_i] = E[\alpha_s X_{prev}+\omega_s]$

using the rules of associativity for expected value...

$E[X_i] = \alpha_s E[X_{prev}]+E[\omega_s]$

The expected value of zero-mean noise is 0, so now we find

$E[X_i] = \alpha_s E[X_{prev}]$

This matches exactly with the code for pred_mean - our new guess is the old mean multiplied by the scale factor.
The pred_sigma function is a little trickier.
$Var(X_i) = Var(\alpha_s X_{prev}+\omega_s)$
Since $Var(Q)$ is the same as $E[(Q-E[Q])^2]$, we now see
$Var(X_i) =$
$E[(\alpha_s X_{prev}+\omega_s)^2]-E[\alpha_s X_{prev}+\omega_s]^2$
Expanding, this now becomes
$Var(X_i) =$
      $\alpha_s^2 E[X_{prev}^2]+2\alpha_s E[X_{prev}] E[\omega_s]+$
                  $E[\omega_s^2] - E[\alpha_s X{prev} + \omega_s]^2$

Continuing, we see that

$ - E[\alpha_s X{prev} + \omega_s]^2$ can be expanded to
$ - E[\alpha_s X_{prev}+\omega_s]E[\alpha X_{prev}+\omega_s]$

which becomes

$-(E[\alpha_s X_{prev}]+E[\omega_s])(E[\alpha X_{prev}]+E[\omega_s])$
 
FOIL, and we now see 


$-\alpha_s^2E[X_{prev}]^2-2\alpha_s E[\omega_s]E[X_{prev}]-E[\omega_s]^2$


One view of this expression is that $\omega_s$ is a normally distributed random variable with a mean of 0. $E[\omega_s]$ is 0, and squaring that is still 0, so both the $E[\omega_s]$ and $E[\omega_s]^2$ end up going to 0. It is important to note that $E[\omega_s^2]$ will NOT go to zero, as the mean of the squared distribution is no longer centered about 0, and instead becomes $\omega_s^2$.

However, another, more natural view is to note that $E[\omega_s^2]-E[\omega_s]^2$ is identical to $Var(\omega_s)$ which is actually  $\omega_s^2$. The $2\alpha_s$ terms cancel.

Either way, the final result is the same.

$Var(X_i) = \alpha_s^2 Var(X_{prev}) + \omega_s^2$

This matches the code for pred_sigma - square root of scale factor squared times the previous sigma + the measurement sigma is our best guess for the new sigma.  

See http://mathworld.wolfram.com/Variance.html for an alternate derivation that leads to the same result.



Step 4. 
Now we can write a function to tie it all together.

runit <- function() {
     source_sigma <- .01
     source_scale <- sqrt(1-source_sigma**2)
     meas_sigma <- .4
     x <- randomwalk(1000, source_scale, source_sigma)
     y <- received(x, meas_sigma)
     k <- filt(y, source_scale, source_sigma, meas_sigma)
     plot(y, type="l", col="red")
     lines(k, col="green")
     lines(x, col="blue")

}

The blue is the source signal, the red is the signal at measurement, and green is the recovered signal. Even when measurement noise washes out the root signal, we can recover the original fairly well.


Try tuning each of the sigmas, and see how the results change - it's pretty interesting.



There you have it! A simple, logical derivation of the Kalman filter as a recursive Bayesian filter. In the future I plan to write about more complex statistical processing methods as I learn them, such as how to run this simulation with 0 known parameters, or implementation of one of the non-linear Kalman filter algorithms.

As always, critique is both welcome and appreciated. 
 
CODE
To run this code, simply copy and paste into a source file. Then open an R interpreter, do source("path/to/source/file"), then do runit().

randomwalk <- function(num, source_scale, source_sigma) {
    randy <- c(0,1:num)
    for(i in 2:length(randy)) {
        randy[i] <- source_scale*randy[i-1]+rnorm(1, sd=source_sigma) 
    }
    return(randy[2:length(randy)])
}

received <- function(sent, meas_sigma) {
    measured <- sent+rnorm(length(sent), sd=meas_sigma)
    return(measured)
}

pred_mean <- function(source_scale, prev_mean) {
    return(source_scale*prev_mean)
}

pred_sigma <- function(source_scale, prev_sigma, source_sigma) {
    return(sqrt((source_scale**2)*(prev_sigma**2)+source_sigma**2))
}

update_mean <- function(pred_mean, pred_sigma, meas_val, meas_sigma) {
    numerator <- (pred_mean/(pred_sigma**2))+(meas_val/(meas_sigma**2))
    denominator <- (1/(pred_sigma**2))+(1/(meas_sigma**2))
    return(numerator/denominator)
}

update_sigma <- function(pred_sigma, meas_sigma) {
    r =(1/(pred_sigma**2))+(1/(meas_sigma**2))
    return(1/sqrt(r))
}

filt <- function(y, source_scale, source_sigma, meas_sigma) {
   last_mean <- 0
   last_sigma <- source_sigma
   k <- 1:length(y)
   for(i in 1:length(y)) {
       est_mean <- pred_mean(source_scale, last_mean)
       est_sigma <- pred_sigma(source_scale, last_sigma, source_sigma)
       k[i] <- est_mean+rnorm(1, sd=est_sigma)
       last_mean <- update_mean(est_mean, est_sigma, y[i], meas_sigma)
       last_sigma <- update_sigma(est_sigma, meas_sigma)
   }
   return(k)
}

runit <- function() {
    source_sigma <- .01
    source_scale <- sqrt(1-source_sigma**2)
    meas_sigma <- .4
    x <- randomwalk(1000, source_scale, source_sigma)
    y <- received(x, meas_sigma)
    k <- filt(y, source_scale, source_sigma, meas_sigma) 
    plot(y, type="l", col="red")
    lines(k, col="green")
    lines(x, col="blue")

}

Sunday, February 12, 2012

Copy-free Data From C In Python

BACKGROUND
Lately, I found myself doing heavy computational work in C, storing large volumes of data in C arrays. However, once the computational work has been done, I also find the need to explore the data, and decide how make algorithmic decisions about various features of the dataset. What I needed was an embedded interpreter.

Luckily, there is a ton of information out there about embedding interpreters into C and C++ code. Languages such as Lua, Tcl, JavaScript, and Python are all  well suited to this, and bring the ability to explore algorithmic changes in scripted code, without having to recompile each time. Lua especially has a large following in the gaming community, with well documented examples for usage in C and C++ code.

ATTEMPT WITH LUA
Lua has a very simple, stack based interface that works very well for simple tasks. However, it does require data to be copied when it is pushed onto the stack, so that the Lua environment has it's own copy which can be garbage collected. Function pointers can be pushed in the form of lightuserdata, but these pointers didn't appear to be of much use.

Due to the size of my dataset, any kind of copying after initialization in my C code is a big performance hit. The data transition to Lua was fast, but not fast enough. I needed another solution.

ATTEMPT WITH PURE PYTHON
The Python-C API is very powerful, and like most Python, extensively documented. The basic concept is to instantiate a Python environment, create a PyObject* in C, then run a script that uses the initialized PyObject. The PyObject* can be instantiated using different functions which create any Python object.

However, there is a problem. The default Python API requires data to be copied into a list, if the datasize and values are not known at compile time. This put me in a similar position as the Lua attempt, and copying my data to Python took even longer than the copy to Lua.

Things were grim, but there was one, last ray of hope... NumPy!
 
THE ANSWER
Using the NumPy-C API, it is possible to create a PyArray from a pointer. What this means is a pointer to the start of a data array can be passed into a function, and a PyObject of type NumPy array will be created, where the data content of the NumPy array is actually the C array! With no data copied across the boundary, I have both the flexibility of a scripted interface and the raw computational power of C, with near zero overhead.

The only downside is that the Python code will not handle freeing of the memory pointed to by the PyArray. This means you may have to free() the memory after the python code has run in your C code, if you allocated it using malloc. You should also be extra careful not to free the memory while trying to look at it from Python. Once again, trading complexity for speed has it's benefits.

I have some (excessively?) strong beliefs that NumPy may be the future of data processing, but I think I'll keep that sermon to myself.

REQUIRED PACKAGES AND COMPILE ARGUMENTS
On Ubuntu 11.10, I need the following packages: python-dev, and python-numpy. I acquired them by doing sudo apt-get install python-dev python-numpy

To compile, I used:
gcc -O3 -std=c99 -I$NUMPY test.c -lpython2.7
    where NUMPY was previously declared using
    export NUMPY=/usr/lib/pymodules/python2.7/numpy/core/include/numpy/

Once compiled, I simply did ./a.out test.py
    where test.py was the name of my Python file

This should print out the numbers 0-999, which are the numbers we stored in the C code. We are accessing C code values, directly in Python, without copying the data. Nice!

THE PYTHON CODE 
for i in data:
    print i

THE C CODE 
#include <python2.7/Python.h>
#include <arrayobject.h>

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("This program requires a path to a Python file as an argument!\n");
    }
    //Initialize general Python environment
    Py_Initialize();
    //Initialize NumPy 
    import_array();

    //Create array of values
    int size = 1000;
    int vals[size];
    for (int i=0; i<size; i++) {
        vals[i] = i;
    }
    
    //Indicate that the array is 1-D
    npy_intp dims[1] = {size}; 

    //Create NumPy array that shares data with vals
    PyObject* data = PyArray_SimpleNewFromData(1,
                                                                                    dims,
                                                                                    NPY_INT,
                                                                                    &vals[0]);
    
    //Add data array to globally accessible space
    PyObject* m = PyImport_AddModule("__main__");
    PyModule_AddObject(m, "data", data);

    //Optional ability to make data read-only. Nice if you want to run multiple
    //python files over the same base data
    PyRun_SimpleString("data.setflags(write=False)");
    PyObject* py_f =PyFile_FromString(argv[1], "r");
    PyRun_SimpleFile(PyFile_AsFile(py_f), argv[1]);
    Py_Finalize();
    return 0;
}






Wednesday, October 26, 2011

Using Scipy To Do Inline FFTs

INSTALLING
You need to do a sudo apt-get install python2.x-dev to get inline to work (I did 2.7). When scipy.weave.inline() runs, there is a call to the C header 'Python.h' which is contained in the dev package.

If there are issues with building the C code, i.e. cryptic errors like "Error Code -1 returned", adding verbose=2 to the arguments of the inline() call will show every line output as it's being built - this is very useful for figuring out why the inline build is breaking!


BACKGROUND
This code pretty much builds on my previous blog post about FFTW. I am a big fan of the speed and simplicity of Python, especially for interacting with the web. However, after my data is fetched from the web, I also want to perform some pretty intense analytics, and even NumPy left something to be desired. After reading this page on heavy computation in Python (http://www.scipy.org/PerformancePython), I found out that inline C can get pretty close to pure C functions, and by interfacing the PyObjects directly, disk I/O can be avoided, which may also save some compute time. And it is also much, much simpler (and faster to code) in Python instead of pure C.  


CODES OF NOTE
There are a couple of nice things about inlining these functions. Notice the ['comp', 'num_fft', 'fft_len', 'val'] inside the scipy.weave.inline() call. This exposes those Python objects to the C code, allowing them to be used. I use num_fftlen_fft, and val to set loop sizes and initial values, then after performing the FFTs, the values in out are written to comp using C array indexing. Because the numpy array is declared as complex, the real and imaginary components are combined and written using (re,imag) form. 


FINAL THOUGHTS
Here are the statistics I get running two chunks of code against each other, each using dft_plan_many to do 100,000 length 32 FFTs, with FFTW_PLAN set to FFTW_ESTIMATE.

Python
$ time ./fftw_plan_many.py
real 0m0.738s
user 0m0.416s
sys 0m0.192s

C
$ time ./many_dft.o
real 0m0.188s
user 0m0.064s
sys 0m0.088s

Not bad to do 100,000 FFTs, albeit small ones (32 bin) in < 1 sec. However C (like Sebulba) always wins in numerics. 

I am planning to continue this investigation thread at a later date. Once my nVidia development computer gets up and running, I will likely attempt to do CUDAFFT with this same interface, and compare that to the current findings.

SIMPLE CODE
#!/usr/bin/python
from scipy.weave import inline

inline( 'printf("Hello World");',
          headers=['<stdio.h>'],
          compiler='gcc')
print ""


PYTHON FFTW PLAN MANY CODE
#!/usr/bin/python
import numpy as np
from scipy.weave import inline
from numpy.distutils.system_info import get_info
fftw_plan_many_dft = \
"""
fftw_complex* in;
fftw_complex* out;
fftw_plan plan;

in = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*fft_len*num_fft);
out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*fft_len*num_fft);

plan = fftw_plan_many_dft( 1,
                           &fft_len,
                           num_fft,
                           in,
                           NULL,
                           num_fft,
                           1,
                           out,
                           NULL,
                           num_fft,
                           1,
                           FFTW_FORWARD,
                           FFTW_ESTIMATE );

for (int i = 0; i < num_fft; i++){
     val+=1;
     for (int j = 0; j < fft_len; j++) {
          in[j+i*fft_len][0] = val;
          in[j+i*fft_len][1] = val ;
     }
}
fftw_execute(plan);       
for (int i =0; i < num_fft; i++){
    for (int j = 0; j < fft_len; j++){
comp[j+i*fft_len] = std::complex<double>(out[j+i*fft_len][0]/fft_len, out[j+i*fft_len][1]/fft_len);
    }
}
fftw_destroy_plan(plan);
"""

fftw_info = get_info( 'fftw3' )
num_fft = 100000 
fft_len = 32
val = 0.
comp = np.array(np.zeros(num_fft*fft_len), dtype=complex)
inline( fftw_plan_many_dft,
        ['comp', 'num_fft', 'fft_len', 'val'],
headers=['<fftw3.h>'],
libraries=['fftw3'],
        include_dirs=fftw_info['include_dirs'],
library_dirs=fftw_info['library_dirs'],
compiler='gcc' )
print comp

Saturday, October 8, 2011

A Genetic Solution to the N-queens Problem

GETTING IN THE QUEEN'S GENES
Lately, I have become interested in the techniques of genetic algorithms, shortened as GA. After speaking with a friend about an assignment in his algorithms course, I decided that my first GA project was going to be an expandable genetic n-queens solver. I have included the code below.


The N-queens problem is one of the classic problems of computer science. The premise is simple - for any chess board NxN, where N>3, there exists at least one arrangement such that N queens can be placed without conflicting in either a row or a column. More information and a better explanation can be found here - http://en.wikipedia.org/wiki/Nqueens . The reason this problem is popular in algorithms classes around the globe (besides the association with Djikstra) is because for N > 8 or so, the problem has a solution space large enough that brute forcing becomes impractical. This requires the student to think about the parameters of the puzzle and find ways to eliminate "stupid" possibilities,  paring down the number of possibilities to a reasonable volume.




SIMPLE EXAMPLE
Suppose we choose to solve the problem for N =4. This means that the board size is 4^2=16, and the number of queens we can fit inside the board is 4. Picturing an empty chess board, it would look something like this:
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
 [0 0 0 0] 

If we were to place the queens randomly around the board:
[1 0 1 0]
[0 0 0 1]
[0 0 0 0]
[0 0 0 1]

Here we can clearly see that the purple queens conflict by row, while the yellow conflict by column. There is also a diagonal conflict between the purple and yellow in the upper right hand corner. Shifting things some more, we could eventually see:
[0 0 1 0]
[1 0 0 0]
[0 0 0 1]
[0 1 0 0]

There are no row, column, or diagonal conflicts. This is one valid solution to the N-queens problem.



CODE PROCESS
Upon searching for existing examples, I found samples of code from http://pyevolve.sourceforge.net/0_6rc1/examples.html , which actually have an example of an N-queens solver (second from the bottom of the page). I didn't like the logic in the sample code, so I decided to create my own flavor for the answer, borrowing from this existing source. I also figured it could be a good way to learn more about genetic programming.


I am a big fan of functional programming, so I have a strong tendency to use (and abuse) the pseudo-functional abilities available in python. Luckily, using map() and filter() tends to speed up a lot of python code, at least in my experience.


The example from sourceforge uses simpler python code, but the logic behind it seems unintuitive to me. When I first read about the problem, my first thought was that minimizing conflicts would eventually lead to the right answer. I also thought of the board exactly as it appears in the above simple example, whereas the sample code used the range function to create a list of numbers (0-board_size) to look for conflicts. I found it simpler to just use ones and zeroes. The sourceforge code also counts collisions, but then inverts the scoring for some reason, requiring the use of "maximize" int the .setMinimax method. Assuredly, I did do some wonky Python tricks to make my code work, but I will explain myself below.




PYTHON EXPLANATION 
The very first step that I took was downloading the PyEvolve package. This can be done either by doing sudo apt-get install python-pyevolve in a Debian-based Linux install, or by downloading the source from http://pyevolve.sourceforge.net/ . Once PyEvolve is installed, everything should be ready to go.


Line-by-Line of queens_func
if sum(genome) < BORDER: return BOARD_SIZE
This line is required to keep the GA from ranking bad solutions with high scores. In this case, the GA will create some solutions with no pieces in a row (or no pieces in any rows!). These solutions should be regarded as genetic failures, and given the worst possible score. Otherwise, a solution of all zeroes could be considered to meet the exit criteria, as there would be no conflicts.


array = map( lambda x: list(genome[x::BORDER]), range(BORDER))
The first abuse of map occurs here. map(a,b) applies a given function a to every element in list b. I can can also do something like map(lambda x: x+1, b). By using lambda x:, I am defining an inline function that will apply x+1 to any input x. Basically x++. This is identical to writing def function(x): return x+1 elsewhere in my file, and doing map(function, b). By doing list(genome[x::BORDER], I am indexing through the list genome in steps of size BORDER, starting at x (where x is the number we are currently on in range(BORDER), and creating a list for each grouping. For example:
genome = [1, 2, 3, 4, 5, 6, 7, 8, 9]
BORDER = 3
range(BORDER) -> [0, 1, 2] 
map( lambda x: genome[x::BORDER], range(BORDER)) -> [[1, 4, 7], [2, 4, 8], [3, 6, 9]]
This can also interpreted as a list of columns, where array[num] returns a list of the values in column num. Effectively, this creates a 2D list of lists out of a 1D list, without involving NumPy. I didn't want to include NumPy because I may have to use a similar program on work computer that doesn't have access to NumPy.


cols = map(sum, array)
Nothing too special here - we simply want to sum up the number of pieces in each column of the array. Ideally, this result will be all ones, as that means there is only 1 piece in every column.

rows = map(sum, map(list, zip(*array)))

This is a fun one. zip(*array) uses the splat operator to "unlist" every column in array, then regroup based on position in each sublist. Applying this to the earlier example:
array = [[1, 4, 7], [2, 4, 8], [3, 6,9]]
zip(*array) -> [(1, 2, 3), (4, 5, 6), (7, 8, 9)].
This can be generalized to [(all pos(0)), (all pos(1)), (all pos(2)), ...]. Doing map(list, ) turns each tuple into a list, creating a 2D transpose of the earlier defined array. Summing these will give the total number of queens in each row. Once again the ideal result is all ones.


rows_ints = [] for row in rows: rows_ints.append(int(''.join(map(str, row)), 2))
rows_ints = map( lambda x: int(math.log(x,2)) if x !=0 else x, rows_ints)
For this part, I wanted to turn each row into it's binary equivalent, so [1, 0, 0, 0] would become 1000. I did this because I thought an easy way to test for diagonals would be to see if row neighbors are also power of 2 neighbors, by checking the power of 2 exponent. If they are off by one, then a diagonal conflict is present. A brief example:
[1,0,0] -> 100 = 8 -> 2^3 -> 3
[0,1,0] -> 010 = 4 -> 2^2 -> 2
[0,0,1] -> 001 = 2 -> 2^1 -> 1
The lambda expression actually contains a pseudo if-else statement. int(math.log(x,2) is returned if x isn't equal to zero - otherwise x (which must be zero) is returned unchanged. This is because log(x) will not work if x = 0. Gotta love log rules.


diags = filter( lambda x: x==1, map( lambda i,j: abs(i-j), rows_ints[1:], rows_ints[:-1]))
Working from the inside out, map( lambda i,j: abs(i-j), rows_ints[1:], rows_ints[:-1]) takes a shifted slice from rows_ints and point-by-point subracts the values from the default slice. A brief example:
a = [1,2,3]
a[1:] -> [2,3]  
a[:-1] -> [1,2]
map( lambda i,j: abs(i-j), a[1:], a[:-1] ) -> [abs(2-1), abs(3-2)] ->[1,1]
Once this operation is complete, the next step is to filter out all the differences of 1 between rows. As discussed previously, a difference of 1 between neighbor rows indicates that there is a diagonal conflict between them.


return max(map(sum, rows)) + max(map(sum, cols)) - 2 + sum(diags)
Our return score will be the worst count per row, the worst count per column, and the total number of diagonal conflicts. This result is scaled down by -2 because the best result possible for max(map(sum))) for rows is 1, and the same for columns. This would indicate that the worst values for queens per column is 1, and the worst value for queens per row is 1, meaning there are no row/col conflicts. If sum(diags) was then 0, this would be a solution that meets the bestrawscore criteria. If genome.setParams was changed to  genome.setParams(bestrawscore=2, rounddecimal=2), then the -2 could be eliminated from the return statement.


In the queens_init function
[1]*N1 + [0]*N2 will actually create one expanded list that is of length N1+N2, where the first N1 values are 1 and the rest are 0. Pretty self explanatory, but I thought it was neat that python understood using the * operator for expansion on a list, rather than multiplying each value in the list by N1 or N2. Of course, that is the same "feature" that annoyed me in the past after coming to python from MATLAB, but I am beginning to understand why this is the default behavior, and scientific pythonistas need to use NumPy to get closer to MATLAB functionality.




CHANGES IN MAIN
I only made a few other changes to the sample code. I changed the genome.setParams(bestrawscore=0), because my methodology was to minimize, rather than maximize - and once I found a score of 0, that was a solution to the problem. This also meant adjusting to ga.setMinimax(Consts.minimaxType["minimize"]), rather than maximize.


One of the more superflous changes was to ga.setPopulationSize(sum(range(NQUEENS))). I felt that increasing or decreasing the size per generation based on the board size might help avoid stagnation of the gene pool - I never ran into this issue but reading various sources about genetic algorithms I became worried that it might become an issue for large values of BORDER. I have no evidence to back this thought, but it didn't appear to hurt anything, either.


I also increased ga.setGenerations(1E7), so that it would effectively run until a solution was found. As far as I know, there is no way to set a true "infinite" generations setting in PyEvolve, but this is pretty close.




FINAL NOTES
Feel free to use, abuse, modify, or otherwise put to use the code below. Please let me know if you know about the best way to set population size, or if any other problems are found.




CODE
#!/usr/bin/python
from pyevolve import *
from random import shuffle
import math

BORDER = 8
NQUEENS = BORDER
BOARD_SIZE = pow(BORDER, 2)

def queens_func(genome):
    if sum(genome) < BORDER:
        return BOARD_SIZE
    array = map( lambda x: list(genome[x::BORDER]), range(BORDER))
    cols = array
    rows = map(list, zip(*array))
    rows_ints = []
   
    for row in rows:
        rows_ints.append(int(''.join(map(str, row)), 2))
        rows_ints = map( lambda x: int(math.log(x,2)) if x !=0 else x, rows_ints)
        diags = filter( lambda x: x==1, map( lambda i,j: abs(i-j), rows_ints[1:], rows_ints[:-1]))
    return max(map(sum, rows)) + max(map(sum, cols)) - 2 + sum(diags)

def queens_init(genome, **args):
    genome.genomeList = [1]*NQUEENS + [0]*(BOARD_SIZE-NQUEENS)
    shuffle(genome.genomeList)

def main_func():
    genome = G1DList.G1DList(BOARD_SIZE)
    genome.setParams(bestrawscore=0, rounddecimal=2)
    genome.initializator.set(queens_init)
    genome.mutator.set(Mutators.G1DListMutatorSwap)
    genome.crossover.set(Crossovers.G1DListCrossoverCutCrossfill)
    genome.evaluator.set(queens_func)
    ga = GSimpleGA.GSimpleGA(genome)
    ga.terminationCriteria.set(GSimpleGA.RawScoreCriteria)
    ga.setMinimax(Consts.minimaxType["minimize"])
    ga.setPopulationSize(100)
    ga.setGenerations(1E7)
    ga.setMutationRate(0.02)
    ga.setCrossoverRate(1.0)
    #vpython_adapter = DBAdapters.DBVPythonGraph(identify="queens", frequency=1)
    #ga.setDBAdapter(vpython_adapter)
    ga.evolve(freq_stats=50)
    best = ga.bestIndividual()
    print best
    for i in map(lambda x: list(best[x::BORDER]), range(BORDER)):
        print i
      
    print "Best individual score : %.2f\n" %(best.getRawScore())

if __name__ == "__main__":
    main_func()

Friday, September 16, 2011

Using FFTW plan_many_dft

CHASING FFTW3
The fftw3 library can be installed by doing sudo apt-get install libfftw3-3, and also
sudo ln -s /usr/lib/libfftw3.so.3.2.4 /usr/lib/libfftw3.so.


Apparently gcc can’t figure out that you are linking in a library unless it ends explicitly in .so, so by creating a symlink to the actual library a quick workaround is achieved.


The final gcc line to compile these programs is gcc <file> -o <objectname> -lfftw3 -lm


If libfftw3 gets broken during installation, bad things can happen. For me, this meant hours of trying to hack together a solution.


I finally ended up throwing in the towel, and doing sudo apt-get remove libfftw3-3, sudo apt-get purge libfftw3-3, sudo apt-get install libfftw3-3, sudo apt-get install libfftw3-3-dev, sudo apt-get install libfftw3-3-doc. Now everything works it great.


The single code is pretty simple, and is explained well by the FFTW documentation. READ IT! I mainly wrote the single code to have a reference template for doing other fftw stuff. The plan_many interface was confusing for me, so I wrote some code explaining it and how it is used.




A SIMPLE SCRIPT
Write a (python!... or something less awesome) script to print system time, call the executable, then print system time. EDIT: Or... just do time <your executable> in Linux.
In the dftmany C code -
  #include <stdlib.h>
  Change num_fft to a large number - say 400, 1000, etc.
  Change fft_len to something more “usable”, say 8192
  Fill the in array like this
    in[j+i*fft_len][0] = (double)rand();
    in[j+i*fft_len][1] = (double)rand();



Now laugh, maniacally.


Even with basic FFTW flags like FFTW_ESTIMATE, 400 8192 length FFTs can be  computed in .17s on a 3.2 GHz Core i7 - other processors may be slightly slower, but this application is single threaded so any computer with at least one idle 3+ GHz core could likely get similar results.

To go faster, try these gcc flags on for size - and remember “oh3” NOT “zero3”
CFLAGS = -O3 -ffast-math -mfpmath=sse -funroll-loops


And to go even EVEN faster, one could compile fftw from source using different build flags.
Have fun!


SIMPLE CODE
#include <stdio.h>
#include <fftw3.h>

int main (void) {
  int N = 64;
  double val = 1.;

  fftw_complex* in;
  fftw_complex* out;
  fftw_plan plan;

  in = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * N);
  out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * N);

  plan = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);

  for (int i = 0; i < N; i++) {
    in[i][0] = val;
    in[i][1] = 0.;
  }

  fftw_execute(plan);

  #Results are non-normalized, need to scale by N. Should see DC=1, then all 0
  for (int j = 0; j < N; j++) {
    printf("%f+%fj", out[j][0]/N, out[j][1]/N);
  }

  fftw_destroy_plan(plan);

  return 0;
}




PLAN_MANY CODE
#include <stdio.h>
#include <fftw3.h>

int main (void) {
  int num_fft = 3;
  int fft_len = 64;
  double val = 0.;

  fftw_complex* in;
  fftw_complex* out;
  fftw_plan plan;

  //Alloc empty arrays the size of the data to be transformed
  in = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*fft_len*num_fft);
  out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*fft_len*num_fft);

   //Rank is 1, because the operation is for 1d arrays.
   //
   //Pointer to fft_len denotes that each fft is fft_len wide.
  //
  //Stride parameters are num_fft, because there are num_fft arrays contiguous in memory.
  //
  //Embed parameters are null because these arrays are not subsets of a larger n-dim
  //array.
  plan = fftw_plan_many_dft( 1,
                                               &fft_len,
                                               num_fft,
                                               in,
                                               NULL,
                                               num_fft,
                                               1,
                                               out,
                                               NULL,
                                               num_fft,
                                               1,
                                               FFTW_FORWARD,
                                               FFTW_ESTIMATE );

  //Create input array of the form [1----][2----][3----][4-----]... etc. Use increasing val to show
  //that the fft boundaries are correct. If two different values showed up in the same
  //fft chunk, frequency content would be non-zero.
  for (int i = 0; i < num_fft; i++){
    val+=1;
    for (int j = 0; j < fft_len; j++) {
      in[j+i*fft_len][0] = val;
      in[j+i*fft_len][1] = 0.;
    }
  }

  fftw_execute(plan);

  //Read back the output. Should be something like [1+0j 0+0j........... 1+0j 0+0j........] i.e.
  //only DC terms show up in the spectrum output.
  for (int q = 0; q < num_fft; q++) {
    for (int r = 0; r < fft_len; r++) {
      printf("%f+%fj\n", out[r+q*fft_len][0]/fft_len, out[r+q*fft_len][1]/fft_len);
    }
  }

  fftw_destroy_plan(plan);

  return 0;
}