How do I avoid accessing the import database in Django?

My Django app has several categories for things that I store in the model Category

. I refer to them a lot in code, so I found it useful to have a module with references ("constants") to these categories and groups of them, so typos will work quickly. This also provides the advantage of caching. And finally, this is the actual model, so it has all the related functionality. It looks something like this:

def load_category(name):
  return Category.objects.get(name=name)

DOGS = load_category("dogs")
CATS = load_category("cats")

      

However, this leads to access to the import-time database and causes various problems. After adding a new category with a link like this, I have to migrate the data before the function ./manage.py

works. I just ran into a new problem when switching to using the Django test environment, which is that this load is from the default database (like dev or prod) and not the test one as stated in this warning .

If your code tries to access a database when its modules are compiled, it will happen before the test database is created, with potentially unexpected results. For example, if you have a database query at the code unit level and a real database exists, production data can pollute your tests. It's a bad idea to have this kind of database query import time in your code anyway - rewrite your code so it doesn't.

What's the best pattern to take advantage of these links while avoiding access to the import database?

One possible solution is a proxy pattern, which returns a pseudo-category that redirects all the functionality of the model, but does not access the database until it is needed. I would like to see how others have solved this problem with this approach or another solution.

(Related but different question: Django test. Looking for data from your production database while running tests? )

The final approach

@ Kevin-christopher-henry's approach worked well for me. However, in addition to fixing those claimed links, I also had to delay accessing links from other code. I found two approaches helpful here.

I first discovered Python Lazy Object Proxy . This simple object takes a factory function as an input file, which is easily executed to create the wrapped object.

MAP_OF_THINGS = Proxy(lambda: {
        DOG: ...
        CAT: ...
})

      

A similar way of doing the same thing was code pushing on the factory functions, decorated with memoize so that they only execute once.

NOTE. I first tried using the Proxy object above as a direct solution to my problem of lazy access to model objects. However, despite very good simulations, when querying and filtering on these objects, I got:

TypeError: 'Category' object is not callable

      

Proxy

Returns True

for , of course callable

(even if the docs say it doesn't guarantee it's callable). It looks like Django queries are too smart and involve something incompatible with the fake model.

Might Proxy

be good enough for your application .

+3


source to share


3 answers


I ran into the same problem myself and agree that it would be great to have some best practices here.

I ended up with a protocol descriptor based approach :

class LazyInstance:
    def __init__(self, *args, **kwargs):
        self.args = args
        self.kwargs = kwargs
        self.instance = None

    def __get__(self, obj, cls):
        if self.instance is None:
            self.instance, _ = cls.objects.get_or_create(*self.args, **self.kwargs)

        return self.instance

      



Then, in my model classes, I have special objects:

class Category(models.Model):
    name = models.CharField()

    DOGS = LazyInstance(name="dogs")
    CATS = LazyInstance(name="cats")

      

Thus, nothing happens during import. The first time a custom object is accessed, the corresponding instance is scanned (and created, if necessary) and cached.

+2


source


There is not much you can do with module-level variables, as you cannot override their accessor functions. However, you can do this for class and instance variables via __getattribute__

. You can use this to load your categories lazily:

class Categories(object):
    _categories = {'DOGS': 'dogs', 'CATS': 'cats'}
    def __getattribute__(self, key):
        try:
            return super(Categories, self).__getattribute__(key)
        except AttributeError:
            pass
        try:
            value = load_category(self._categories[key])
        except KeyError:
            raise AttributeError(key)
        setattr(self, key, value)
        return value

Categories = Categories()  # Shadow class with singleton instance

      



Instead, module.DOGS

you should use module.Categories.DOGS

. On first access, the category is loaded and saved for future searches.

+1


source


I have used lazy_object_proxy (which works with the passed function, but no arguments passed) in addition to the functools.partial as shown below:

import lazy_object_proxy
from functools import partial

def load_category(name):
  # prepare an argument-less runnable function
  loader = partial(Category.objects.get, name)

  # pass the function to the proxy
  return lazy_object_proxy.Proxy(loader)

DOGS = load_category("dogs")
CATS = load_category("cats")

      

0


source







All Articles