Batch operations support for django-nonrel
I have implemented custom backend for Django, that let you use batch operations with Django and Google App Engine. You should be aware that API is not stable yet and I did not get official approval from django-nonrel team yet (although there is some progress).
Get all needed libraries:
hg clone https://bitbucket.org/wkornewald/django-testapp batch-save-test hg clone https://bitbucket.org/wkornewald/django-nonrel django-nonrel-bs hg clone https://bitbucket.org/wkornewald/djangoappengine djangoappengine-bs hg clone https://bitbucket.org/wkornewald/djangotoolbox djangotoolbox-bs hg clone https://bitbucket.org/wkornewald/django-dbindexer django-dbindexer-bs
hg -R django-nonrel-bs pull -u https://bitbucket.org/vmihailenco/django-nonrel hg -R djangoappengine-bs pull -u https://bitbucket.org/vmihailenco/ djangoappengine hg -R django-dbindexer-bs pull -u https://bitbucket.org/vmihailenco/django-dbindexer
Create symbolic links to fetched libraries:
cd batch-save-test ln -s ../django-nonrel-bs/django ln -s ../djangoappengine-bs djangoappengine ln -s ../djangotoolbox-bs/djangotoolbox ln -s ../django-dbindexer-bs/dbindexer
Try to run server:
Now you should be able to use batch operations with Django.
The simplest example of using batch operations looks like this:
from __future__ import with_statement from django.db.models import BatchOperation with BatchOperation() as op: for i in range(100): op.save(Post(title='Title %d' % i, text='Text %d' % i))
That’s it. Internally code above will create two pools: one for save operations and one for delete operations. When pool is filled entries will be flushed to backend and backend is responsible to batch save them. Available configuration options are:
- pool_size (save_pool_size, delete_pool_size) - number of models instances stored in pool (500 by default). If you experience problems with memory you can try to lower this value.
- batch_size (save_batch_size, delete_batch_size) - number of models instances that will be flushed in one batch operation (100 by default). If you experience problems with datastore timeout exceptions you can try to lower this value.
We need to make differences between pool_size and batch_size, because theoretically Django can be configured to use several databases. So pool can contain instances for different databases.
You can configure BatchOperation() like this:
config = dict(default=dict(pool_size=50, save_batch_size=50, delete_pool_size=50)) with BatchOperation(config) as op: for p in Post.objects.all()[:100]: op.delete(p)
I have tested batch saves with such views:
def plain_save(request): for i in range(100): Post.objects.create(title='Title %d' % i, text='Text %d' % i) return http.HttpResponse('Ok') def batch_save(request): with BatchOperation() as op: for i in range(100): op.save(Post(title='Title %d' % i, text='Text %d' % i)) return http.HttpResponse('Ok')
and got following results from appstats:
"GET /plain_save/" 200 real=2734ms cpu=1720ms api=6583ms overhead=21ms (100 RPCs) "GET /batch_save/" 200 real=609ms cpu=193ms api=6516ms overhead=0ms (1 RPC)
Feel free to add feedback/report bugs at django-nonrel user group.