Why / how does it iterate over the list and call "pass" every time it commits this function?
I wrote the following function:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
for i in ids: # I have absolutely no idea why this seems to be required:
pass # without this loop, the last AssessmentEntries fail to update
# in the following for loop.
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
start = end_mark
rating += 1
It does what it is supposed to do (i.e. update the corresponding number of entries in assessment_entries_qs
with each rating (starting from lowest_rating
) as specified in amounts
). Here's a simple example:
>>> assessment_entries = AssessmentEntry.objects.all()
>>> print [ae.rating for ae in assessment_entries]
[None, None, None, None, None, None, None, None, None, None]
>>>
>>> auto_update_ratings((2,4,3,1), assessment_entries, 1)
>>> print [ae.rating for ae in assessment_entries]
[1, 1, 2, 2, 2, 2, 3, 3, 3, 4]
However, if I don't iterate through ids
before iterating through amounts
, the function only updates a subset of the queryset: my current test data (approximately 250 AssessmentEntries
in the queryset), this always results in exactly 84 AssessmentEntries
not being updated.
Interestingly, this is always the last iteration of the second loop of the loop, which does not result in any updates (although the rest of the code in this iteration runs correctly), as well as part of the previous iteration. The queries are ordered by_by ('?') Before passing this function and the expected results are achieved if I just add the previous loop "empty" so it doesn't seem like a problem for my data).
A few more details, just in case they turn out to be relevant:
-
AssessmentEntry.rating
is standardIntegerField(null=True,blank=True)
. - I only use this function for testing, so I only did it with iPython.
- Test database - SQLite.
Question: Can someone explain why it seems to me that I need to iterate through ids
despite not touching the data at all, and why without it the function still (sort of) executes correctly, but it always fails to update the last few elements in a set of queries despite seemingly still repeating through them?
source to share
QuerySet and QuerySet are evaluated lazily. Iterative IDs execute the query and make it ids
behave like a static list instead of a QuerySet. So when you step through ids
, it results in what entries
becomes a fixed set of values; but if you are not going through ids
, then entries
it is just a subquery with a clause LIMIT
added to represent the slicing you are doing.
Here's what's going on in detail:
def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
# assessment_entries_qs is an unevaluated QuerySet
# from your calling code, it would probably generate a query like this:
# SELECT * FROM assessments ORDER BY RANDOM()
start = 0
rating = lowest_rating
ids = assessment_entries_qs.values_list('id', flat=True)
# ids is a ValueQuerySet that adds "SELECT id"
# to the query that assessment_entries_qs would generate.
# So ids is now something like:
# SELECT id FROM assessments ORDER BY RANDOM()
# we omit the loop
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# entries is now another QuerySet with a LIMIT clause added:
# SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark)
# When filter() gets a QuerySet, it adds a subquery
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# FINALLY, we now actually EXECUTE a query which is something like this:
# UPDATE assessments SET rating=? WHERE id IN
# (SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark))
start = end_mark
rating += 1
Since the subquery in entries
is executed every time you insert it, and it has a random order, the slicing you do is pointless! This function has no deterministic behavior.
However, when you iterate over IDs, you are actually executing the request, so your slicing has deterministic behavior again and the code does what you expect.
Let's see what happens when you use a loop:
ids = assessment_entries_qs.values_list('id', flat=True)
# Iterating ids causes the query to actually be executed
# This query was sent to the DB:
# SELECT id FROM assessments ORDER BY RANDOM()
for id in ids:
pass
# ids has now been "realized" and contains the *results* of the query
# e.g., [5,1,2,3,4]
# Iterating again (or slicing) will now return values rather than modify the query
for amount in amounts:
end_mark = start + amount
entries = ids[start:end_mark]
# because ids was executed, entries contains definite values
# When filter() gets actual values, it adds a simple condition
a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
# The query executed is something like this:
# UPDATE assessments SET rating=? WHERE id IN (5,1)
# "(5,1)" will change on each iteration, but it will always be a set of
# scalar values rather than a subquery.
start = end_mark
rating += 1
If you ever need to readily evaluate a QuerySet to get all of its values at a point in time, instead of doing a do-nothing iteration, just convert it to a list:
ids = list(assessment_entries_qs.values_list('id', flat=True))
In addition, the Django docs detail when to evaluateQuerySet
.
source to share