Friday, September 16, 2011

Using FFTW plan_many_dft

CHASING FFTW3
The fftw3 library can be installed by doing sudo apt-get install libfftw3-3, and also
sudo ln -s /usr/lib/libfftw3.so.3.2.4 /usr/lib/libfftw3.so.


Apparently gcc can’t figure out that you are linking in a library unless it ends explicitly in .so, so by creating a symlink to the actual library a quick workaround is achieved.


The final gcc line to compile these programs is gcc <file> -o <objectname> -lfftw3 -lm


If libfftw3 gets broken during installation, bad things can happen. For me, this meant hours of trying to hack together a solution.


I finally ended up throwing in the towel, and doing sudo apt-get remove libfftw3-3, sudo apt-get purge libfftw3-3, sudo apt-get install libfftw3-3, sudo apt-get install libfftw3-3-dev, sudo apt-get install libfftw3-3-doc. Now everything works it great.


The single code is pretty simple, and is explained well by the FFTW documentation. READ IT! I mainly wrote the single code to have a reference template for doing other fftw stuff. The plan_many interface was confusing for me, so I wrote some code explaining it and how it is used.




A SIMPLE SCRIPT
Write a (python!... or something less awesome) script to print system time, call the executable, then print system time. EDIT: Or... just do time <your executable> in Linux.
In the dftmany C code -
  #include <stdlib.h>
  Change num_fft to a large number - say 400, 1000, etc.
  Change fft_len to something more “usable”, say 8192
  Fill the in array like this
    in[j+i*fft_len][0] = (double)rand();
    in[j+i*fft_len][1] = (double)rand();



Now laugh, maniacally.


Even with basic FFTW flags like FFTW_ESTIMATE, 400 8192 length FFTs can be  computed in .17s on a 3.2 GHz Core i7 - other processors may be slightly slower, but this application is single threaded so any computer with at least one idle 3+ GHz core could likely get similar results.

To go faster, try these gcc flags on for size - and remember “oh3” NOT “zero3”
CFLAGS = -O3 -ffast-math -mfpmath=sse -funroll-loops


And to go even EVEN faster, one could compile fftw from source using different build flags.
Have fun!


SIMPLE CODE
#include <stdio.h>
#include <fftw3.h>

int main (void) {
  int N = 64;
  double val = 1.;

  fftw_complex* in;
  fftw_complex* out;
  fftw_plan plan;

  in = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * N);
  out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * N);

  plan = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);

  for (int i = 0; i < N; i++) {
    in[i][0] = val;
    in[i][1] = 0.;
  }

  fftw_execute(plan);

  #Results are non-normalized, need to scale by N. Should see DC=1, then all 0
  for (int j = 0; j < N; j++) {
    printf("%f+%fj", out[j][0]/N, out[j][1]/N);
  }

  fftw_destroy_plan(plan);

  return 0;
}




PLAN_MANY CODE
#include <stdio.h>
#include <fftw3.h>

int main (void) {
  int num_fft = 3;
  int fft_len = 64;
  double val = 0.;

  fftw_complex* in;
  fftw_complex* out;
  fftw_plan plan;

  //Alloc empty arrays the size of the data to be transformed
  in = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*fft_len*num_fft);
  out = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*fft_len*num_fft);

   //Rank is 1, because the operation is for 1d arrays.
   //
   //Pointer to fft_len denotes that each fft is fft_len wide.
  //
  //Stride parameters are num_fft, because there are num_fft arrays contiguous in memory.
  //
  //Embed parameters are null because these arrays are not subsets of a larger n-dim
  //array.
  plan = fftw_plan_many_dft( 1,
                                               &fft_len,
                                               num_fft,
                                               in,
                                               NULL,
                                               num_fft,
                                               1,
                                               out,
                                               NULL,
                                               num_fft,
                                               1,
                                               FFTW_FORWARD,
                                               FFTW_ESTIMATE );

  //Create input array of the form [1----][2----][3----][4-----]... etc. Use increasing val to show
  //that the fft boundaries are correct. If two different values showed up in the same
  //fft chunk, frequency content would be non-zero.
  for (int i = 0; i < num_fft; i++){
    val+=1;
    for (int j = 0; j < fft_len; j++) {
      in[j+i*fft_len][0] = val;
      in[j+i*fft_len][1] = 0.;
    }
  }

  fftw_execute(plan);

  //Read back the output. Should be something like [1+0j 0+0j........... 1+0j 0+0j........] i.e.
  //only DC terms show up in the spectrum output.
  for (int q = 0; q < num_fft; q++) {
    for (int r = 0; r < fft_len; r++) {
      printf("%f+%fj\n", out[r+q*fft_len][0]/fft_len, out[r+q*fft_len][1]/fft_len);
    }
  }

  fftw_destroy_plan(plan);

  return 0;
}