Why / how does it iterate over the list and call "pass" every time it commits this function?

I wrote the following function:

def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
    start = 0
    rating = lowest_rating
    ids = assessment_entries_qs.values_list('id', flat=True)

    for i in ids: # I have absolutely no idea why this seems to be required:
        pass      # without this loop, the last AssessmentEntries fail to update 
                  # in the following for loop.

    for amount in amounts:
        end_mark = start + amount
        entries = ids[start:end_mark]
        a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
        start = end_mark
        rating += 1

      

It does what it is supposed to do (i.e. update the corresponding number of entries in assessment_entries_qs

with each rating (starting from lowest_rating

) as specified in amounts

). Here's a simple example:

>>> assessment_entries = AssessmentEntry.objects.all()
>>> print [ae.rating for ae in assessment_entries]
[None, None, None, None, None, None, None, None, None, None]
>>>
>>> auto_update_ratings((2,4,3,1), assessment_entries, 1)
>>> print [ae.rating for ae in assessment_entries]
[1, 1, 2, 2, 2, 2, 3, 3, 3, 4]

      

However, if I don't iterate through ids

before iterating through amounts

, the function only updates a subset of the queryset: my current test data (approximately 250 AssessmentEntries

in the queryset), this always results in exactly 84 AssessmentEntries

not being updated.

Interestingly, this is always the last iteration of the second loop of the loop, which does not result in any updates (although the rest of the code in this iteration runs correctly), as well as part of the previous iteration. The queries are ordered by_by ('?') Before passing this function and the expected results are achieved if I just add the previous loop "empty" so it doesn't seem like a problem for my data).

A few more details, just in case they turn out to be relevant:

  • AssessmentEntry.rating

    is standard IntegerField(null=True,blank=True)

    .
  • I only use this function for testing, so I only did it with iPython.
  • Test database - SQLite.

Question: Can someone explain why it seems to me that I need to iterate through ids

despite not touching the data at all, and why without it the function still (sort of) executes correctly, but it always fails to update the last few elements in a set of queries despite seemingly still repeating through them?

+3


source to share


1 answer


QuerySet and QuerySet are evaluated lazily. Iterative IDs execute the query and make it ids

behave like a static list instead of a QuerySet. So when you step through ids

, it results in what entries

becomes a fixed set of values; but if you are not going through ids

, then entries

it is just a subquery with a clause LIMIT

added to represent the slicing you are doing.

Here's what's going on in detail:

def auto_update_ratings(amounts, assessment_entries_qs, lowest_rating=-1):
    # assessment_entries_qs is an unevaluated QuerySet
    # from your calling code, it would probably generate a query like this:
    # SELECT * FROM assessments ORDER BY RANDOM()
    start = 0
    rating = lowest_rating
    ids = assessment_entries_qs.values_list('id', flat=True)
    # ids is a ValueQuerySet that adds "SELECT id"
    # to the query that assessment_entries_qs would generate.
    # So ids is now something like:
    # SELECT id FROM assessments ORDER BY RANDOM()

    # we omit the loop

    for amount in amounts:
        end_mark = start + amount
        entries = ids[start:end_mark]
        # entries is now another QuerySet with a LIMIT clause added:
        # SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark)
        # When filter() gets a QuerySet, it adds a subquery
        a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
        # FINALLY, we now actually EXECUTE a query which is something like this:
        # UPDATE assessments SET rating=? WHERE id IN 
        # (SELECT id FROM assessments ORDER BY RANDOM() LIMIT start,(start+end_mark))
        start = end_mark
        rating += 1

      

Since the subquery in entries

is executed every time you insert it, and it has a random order, the slicing you do is pointless! This function has no deterministic behavior.

However, when you iterate over IDs, you are actually executing the request, so your slicing has deterministic behavior again and the code does what you expect.



Let's see what happens when you use a loop:

ids = assessment_entries_qs.values_list('id', flat=True)

# Iterating ids causes the query to actually be executed
# This query was sent to the DB:
# SELECT id FROM assessments ORDER BY RANDOM()
for id in ids:
    pass

# ids has now been "realized" and contains the *results* of the query
# e.g., [5,1,2,3,4]
# Iterating again (or slicing) will now return values rather than modify the query

for amount in amounts:
    end_mark = start + amount
    entries = ids[start:end_mark]
    # because ids was executed, entries contains definite values
    # When filter() gets actual values, it adds a simple condition
    a = assessment_entries_qs.filter(id__in=entries).update(rating=rating)
    # The query executed is something like this:
    # UPDATE assessments SET rating=? WHERE id IN (5,1)
    # "(5,1)" will change on each iteration, but it will always be a set of
    # scalar values rather than a subquery.
    start = end_mark
    rating += 1

      

If you ever need to readily evaluate a QuerySet to get all of its values ​​at a point in time, instead of doing a do-nothing iteration, just convert it to a list:

    ids = list(assessment_entries_qs.values_list('id', flat=True))

      

In addition, the Django docs detail when to evaluateQuerySet

.

+4


source







All Articles