Django ORM equivalent to group_concat
We are again trying to create a web tool to help us manage our joinery software . We use Django , which, despite its age, is still the most widely used (and most documented) web programming framework for Python. It's been a few years since I came up with something, and I stumble over a few things.
For example, my data model looks like this (with a lot of irrelevant stuff):
class Person(models.Model):
'''Someone we know.'''
email = models.CharField(max_length=STR_LONG, unique=True, null=True)
class Event(models.Model):
'''A workshop or other event.'''
slug = models.CharField(max_length=STR_LONG, unique=True)
class Role(models.Model):
'''The kinds of things people can do at workshops.'''
name = models.CharField(max_length=STR_MED)
class Task(models.Model):
'''Someone did something at some workshop.'''
event = models.ForeignKey(Event)
person = models.ForeignKey(Person)
role = models.ForeignKey(Role)
One of the application pages displays information about a specific event. I want to add the names of all the people who were instructors at this event to the page. If I were to use SQL directly, I would write something like:
select Event.slug, group_contact(Person.email, ', ')
from Person join Event join Role join Task
on Person.id=Task.person and Event.id=Task.event and Role.id=Task.role
where Role.name='instructor'
group by Event.id;
How can I do this with Django ORM? According to this Stack Overflow question , I can use the 'regroup' tag in the view or create a custom aggregator . The former is complicated by the multi-stage nature of the association, and the latter feels ... difficult. My instinct is that I should be able to attach to this event all the objects Person
corresponding to the instructors in a particular Event
one and then loop through them in my view. If you know how to do this, I would appreciate a pointer.
source to share
How about the following (Django 1.7+):
from django.db.models import Prefetch
prefetch = Prefetch(
'task_set',
queryset=Task.objects.filter(role__name='instructor')
.select_related('person'), to_attr='instructor_tasks'
)
events = Event.objects.all().prefetch_related(prefetch)
for event in events:
print event
for task in event.instructor_tasks:
print task.person.email
Using prefetch_related this way should prevent O (n) DB queries (despite a loop).
Django DB query log output for the aforementioned query / loop (with presumably a tiny data set of 2 events):
DEBUG django.db.backends: (0.001) SELECT "myapp_event"."id", "myapp_event"."slug" FROM "myapp_event"; args=()
DEBUG django.db.backends: (0.001) SELECT "myapp_task"."id", "myapp_task"."event_id", "myapp_task"."person_id", "myapp_task"."role_id", "myapp_person"."id", "myapp_person"."email" FROM "myapp_task" INNER JOIN "myapp_role" ON ( "myapp_task"."role_id" = "myapp_role"."id" ) INNER JOIN "myapp_person" ON ( "myapp_task"."person_id" = "myapp_person"."id" ) WHERE ("myapp_role"."name" = 'instructor' AND "myapp_task"."event_id" IN (1, 2)); args=('instructor', 1, 2)
Thus, this query code should continue to issue only two DB queries even as the number of events grows.
source to share
I don't think you need such a complex query.
For any particular event, you can do this:
instructors = Task.objects.filter(event=my_event, role__name='instructor').select_related('person').values_list('person__email')
which will provide you with a list of emails for all persons related to the tasks for this event, with the role of instructor. Admittedly, this is one event request, not one massive request, but unless you plan on showing hundreds of events per page, the tradeoff in terms of clarity is probably worth it.
source to share
In the upcoming Django 1.8, you can simply implement a GroupConcat expression and then the request looks like this:
Event.objects.values('slug').annotate(emails=GroupConcat('task__person__email'))
Combination of .values ββ(). annotate () sets the GROUP BY to the slug, and of course the GroupConcat implementation does the actual aggregation.
To write an implementation of the GroupConcat implementation, https://docs.djangoproject.com/en/dev/ref/models/expressions/#writing-your-own-query-expressions
source to share
The usual solution here is not to try to optimize and just do one search per event:
for event in Event.objects.filter(**some_lookup):
instructors = Person.objects.filter(
task__role__name="instructor",
task__event=event
).values_list("name", flat=True)
print "Event: %s - instructors : %s" % (event.slug, ", ".join(instructors))
If and when this becomes a performance bottleneck, it's time to find a better solution (which may or may not include falling back to raw sql).
source to share