Sunday, February 12, 2012

Copy-free Data From C In Python

BACKGROUND
Lately, I found myself doing heavy computational work in C, storing large volumes of data in C arrays. However, once the computational work has been done, I also find the need to explore the data, and decide how make algorithmic decisions about various features of the dataset. What I needed was an embedded interpreter.

Luckily, there is a ton of information out there about embedding interpreters into C and C++ code. Languages such as Lua, Tcl, JavaScript, and Python are all  well suited to this, and bring the ability to explore algorithmic changes in scripted code, without having to recompile each time. Lua especially has a large following in the gaming community, with well documented examples for usage in C and C++ code.

ATTEMPT WITH LUA
Lua has a very simple, stack based interface that works very well for simple tasks. However, it does require data to be copied when it is pushed onto the stack, so that the Lua environment has it's own copy which can be garbage collected. Function pointers can be pushed in the form of lightuserdata, but these pointers didn't appear to be of much use.

Due to the size of my dataset, any kind of copying after initialization in my C code is a big performance hit. The data transition to Lua was fast, but not fast enough. I needed another solution.

ATTEMPT WITH PURE PYTHON
The Python-C API is very powerful, and like most Python, extensively documented. The basic concept is to instantiate a Python environment, create a PyObject* in C, then run a script that uses the initialized PyObject. The PyObject* can be instantiated using different functions which create any Python object.

However, there is a problem. The default Python API requires data to be copied into a list, if the datasize and values are not known at compile time. This put me in a similar position as the Lua attempt, and copying my data to Python took even longer than the copy to Lua.

Things were grim, but there was one, last ray of hope... NumPy!
 
THE ANSWER
Using the NumPy-C API, it is possible to create a PyArray from a pointer. What this means is a pointer to the start of a data array can be passed into a function, and a PyObject of type NumPy array will be created, where the data content of the NumPy array is actually the C array! With no data copied across the boundary, I have both the flexibility of a scripted interface and the raw computational power of C, with near zero overhead.

The only downside is that the Python code will not handle freeing of the memory pointed to by the PyArray. This means you may have to free() the memory after the python code has run in your C code, if you allocated it using malloc. You should also be extra careful not to free the memory while trying to look at it from Python. Once again, trading complexity for speed has it's benefits.

I have some (excessively?) strong beliefs that NumPy may be the future of data processing, but I think I'll keep that sermon to myself.

REQUIRED PACKAGES AND COMPILE ARGUMENTS
On Ubuntu 11.10, I need the following packages: python-dev, and python-numpy. I acquired them by doing sudo apt-get install python-dev python-numpy

To compile, I used:
gcc -O3 -std=c99 -I$NUMPY test.c -lpython2.7
    where NUMPY was previously declared using
    export NUMPY=/usr/lib/pymodules/python2.7/numpy/core/include/numpy/

Once compiled, I simply did ./a.out test.py
    where test.py was the name of my Python file

This should print out the numbers 0-999, which are the numbers we stored in the C code. We are accessing C code values, directly in Python, without copying the data. Nice!

THE PYTHON CODE 
for i in data:
    print i

THE C CODE 
#include <python2.7/Python.h>
#include <arrayobject.h>

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("This program requires a path to a Python file as an argument!\n");
    }
    //Initialize general Python environment
    Py_Initialize();
    //Initialize NumPy 
    import_array();

    //Create array of values
    int size = 1000;
    int vals[size];
    for (int i=0; i<size; i++) {
        vals[i] = i;
    }
    
    //Indicate that the array is 1-D
    npy_intp dims[1] = {size}; 

    //Create NumPy array that shares data with vals
    PyObject* data = PyArray_SimpleNewFromData(1,
                                                                                    dims,
                                                                                    NPY_INT,
                                                                                    &vals[0]);
    
    //Add data array to globally accessible space
    PyObject* m = PyImport_AddModule("__main__");
    PyModule_AddObject(m, "data", data);

    //Optional ability to make data read-only. Nice if you want to run multiple
    //python files over the same base data
    PyRun_SimpleString("data.setflags(write=False)");
    PyObject* py_f =PyFile_FromString(argv[1], "r");
    PyRun_SimpleFile(PyFile_AsFile(py_f), argv[1]);
    Py_Finalize();
    return 0;
}






6 comments:

  1. Very interesting, so what did you do with this data?

    ReplyDelete
  2. Awesome post/tutorial! I'm in exactly the same boat as you but have never had the time to look into how to integrate my python scripts into my C code, so I've always run them separately and passed data as files (my data isn't as large as yours so the overhead isn't too bad).

    ReplyDelete
  3. Did you consider either the LuaJit FFI library, or the Python Ctypes library to call the C functions directly ?

    ReplyDelete
  4. You should definitely look at LuaJIT FFI, which allows you to cast a (light)userdata (a simple pointer opaque to Lua) to anything you defined using ffi.cdef. I use it to create a memory buffer (userdata) in C, call an external C++ module (some GPU processing using CUDA) to fill it with data, and return it back to Lua. Then in Lua I cast the userdata to a structure pointer and process it further in Lua.

    No copies are made using casting, and I would not have memory for such copies, as I am dealing with hundreds of megabytes of data.

    ReplyDelete
  5. I didn't look into Ctypes, primarily because I am using the Python interface for algorithm prototyping. Once I get the algorithms down, I may reimplement in C++ if performance isn't up to par.

    I will definitely be looking into LuaJIT FFI - it sounds very powerful, and may be the bridge I need between algorithm development and performance. I am currently planning to use scipy.weave.inline for computationally intensive pieces of my tests.

    ReplyDelete
  6. Looking forward to a post by a Python/R fan (you) on Lua.

    (And perhaps comparing Cython and Weave.)

    ReplyDelete