One of the first steps in scaling your web application (after investing in caching and streamlining your database queries) is offloading processes that take an inordinate amount of time and would otherwise disrupt your users’ experience. Things like updating Facebook profile caches—something typically done when users authenticate—or delivering e-mail. These are what are referred to as blocking operations and you pretty much want to avoid having your users wait for them at all costs.
Nifty Fact
Here at matchFWD we utilize a package called Celery to perform our background operations; the reasons for picking Celery over other solutions is beyond the scope of this short blog post, but rest assured it’s pretty awesome.
One of the interesting problems when using background tasks is how you pass data to them. Most solutions in Python use a serialization format native to Python called pickling, provided by the pickle or cpickle modules. Django models, by default, do some pretty unfortunate things when you try to pickle them.
Before I go into details let’s set up a simple test model:
from pickle import dumps from django.db import models class User(models.Model): name = models.CharField(max_length=200) email = models.EmailField() # Create the record we'll be testing with. meep = User(name="Bob Dole", email="bdole@whitehouse.gov") |
Now that we have a model and sample record, let’s see what happens when we pickle it using Django’s default hooks:
print dumps(meep) # cdjango.db.models.base\nmodel_unpickle\np0\n(csrc.testing.models\nUser\np1\n(lp2\ncdjango.db.models.base\nsimple_class_factory\np3\ntp4\nRp5\n(dp6\nS'email'\np7\nS'bdole@whitehouse.gov'\np8\nsS'_state'\np9\nccopy_reg\n_reconstructor\np10\n(cdjango.db.models.base\nModelState\np11\nc__builtin__\nobject\np12\nNtp13\nRp14\n(dp15\nS'adding'\np16\nI00\nsS'db'\np17\nS'default'\np18\nsbsS'id'\np19\nI1\nsS'name'\np20\nS'Bob Dole'\np21\nsb. |
That’s 389 bytes that actually includes a complete copy of the record’s data! Surely the default hooks provided by Python for pickling objects would do a better job, so let’s try that next:
# We replace Django's reduce method with the default. User.__reduce__ = object.__reduce__ print dumps(meep) # ccopy_reg\n_reconstructor\np0\n(csrc.testing.models\nUser\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'email'\np6\nS'bdole@whitehouse.gov'\np7\nsS'_state'\np8\ng0\n(cdjango.db.models.base\nModelState\np9\ng2\nNtp10\nRp11\n(dp12\nS'adding'\np13\nI00\nsS'db'\np14\nS'default'\np15\nsbsS'id'\np16\nI1\nsS'name'\np17\nS'Bob Dole'\np18\nsb. |
While a little better at only 300 bytes let me explain what’s really going on here. Notice at the beginning of the (very long) dumps output what looks like a package path. This path references the function used to reconstitute the object when it is de-pickled. Sensibly, since models are fairly fancy objects, Django uses one of its own functions to do this. What do these functions actually do?
Django’s de-pickle function does things like check for lazily evaluated values (values originally excluded from the object) and a bunch of other important things related to passing around copies of real data and re-integrating it into an instance of your model. Why aren’t Django signals sent when de-pickling? Because when you de-pickle something only the class’ __new__ is called, not __init__, which is good.
Python’s default pickling mechanism basically just copies the instance’s __dict__ and a reference to the class that spawned the instance. Simple, but effective. Both of these are the Wrong Solution™ when dealing with data loaded from a database. The biggest reason why this is bad is simple and illustrated by the following scenario:
- Bob Dole gets elected president and signs up to your service with the e-mail address
president@whitehouse.gov. - At some point poor Mr. Dole isn’t president any more.
- Your application decides to send him an e-mail. It queues up your
spammy_spamfunction for background execution, passing along Bob’sUserinstance. - Bob Dole changes his e-mail address on your service to
bdole1969@hotmail.com. - The
spammy_spamfunction is eventually executed, Bob’sUserobject de-pickled, and you send some delicious food-like products by e-mail… to the wrong person.
That’s a problem! It gets worse when you realize that it’s a problem for any not-lazily-loaded database column stored this way. A scheduled task to remind someone of a past-due balance? Pickle their record and they’ll get e-mailed even if they paid before the task was scheduled to run. So how do we fix this and make de-pickling actually load the record out of the database for us, all fresh and accurate?
The first half of the problem was solved above by replacing the __reduce__ method on our model with Python’s default one. If we don’t do this then the following addition to our model will never be executed:
class User(models.Model): name = models.CharField(max_length=200) email = models.EmailField() __reduce__ = object.__reduce__ # from above def __getstate__(self): return self.pk def __setstate__(self, pk): self.__dict__ = self.__class__.objects.get(pk=pk).__dict__ |
This might look quite hairy, but works extremely well. What happens now is:
- When pickling an instance of the model the
__getstate__method is called and the returned value—whatever it is—is pickled. - When de-pickling, a new instance is created—without calling
__init__—and__setstate__is called with the de-pickled value returned above as its only argument.
In this case we save the primary key, then attempt to get the object by its primary key. But since __setstate__ doesn’t return a new instance, we have to hot-swap the current instance for the one just loaded from the database. This is technically a mutation of the borg (monostate) pattern, the pattern most Pythonistas use instead of singleton. What does the result of pickling look like now?
print dumps(meep) # ccopy_reg\n_reconstructor\np0\n(csrc.testing.models\nUser\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\nI1\nb. |
Much better! 94 bytes, a reduction to nearly 24% the original size. Now the only things being stored are the function used to de-pickle, a full reference to the class we’ve pickled, and the ID of 1.
At matchFWD use a parent class common to all of our models to ensure everything is loaded from the database when de-pickled, but you can use a mix-in class to selectively apply this behaviour if you wish.
Updated to add: In response to comments from several social sharing sites, here’s the second definition of promiscuous provided by Mac’s Dictionary app to help clarify the title:
pro·mis·cu·ous |prəˈmiskyo͞oəs|
2a. demonstrating or implying an undiscriminating or unselective approach; indiscriminate or casual: the city fathers were promiscuous with their honours
2b. consisting of a wide range of different things: Americans are free to pick and choose from a promiscuous array of values and behaviour.