Do redundant calls to ndb.Model.put_async () only need to be sent once to the datastore?

I have an NDB model that provides several instance methods to manage its state. In some request handlers, I need to call several of these instance methods. To prevent calling put()

more than once in the same object, the pattern I've used so far is similar to this one:

class Foo(ndb.Model):
    prop_a = ndb.StringProperty()
    prop_b = ndb.StringProperty()
    prop_c = ndb.StringProperty()

    def some_method_1(self):
        self.prop_a = "The result of some computation"
        return True

    def some_method_2(self):
        if some_condition:
            self.prop_b = "Some new value"
            return True
        return False

    def some_method_3(self):
        if some_condition:
            self.prop_b = "Some new value"
            return True
        if some_other_condition:
            self.prop_b = "Some new value"
            self.prop_c = "Some new value"
            return True
        return False

def manipulate_foo(f):
    updated = False
    updated = f.some_method_1() or updated
    updated = f.some_method_2() or updated
    updated = f.some_method_3() or updated
    if updated:
        f.put()

      

Basically, every method that can potentially update an object returns a bool to indicate if the object has been updated and therefore should be saved. When calling these methods in sequence, I will make sure to call put()

if any of the methods returned True

.

However, this pattern can be difficult to implement in situations where other routines are involved. In this case, I need to bubble up the updated boolean returned from the subroutines to top level methods.

I am currently optimizing many request handlers trying to limit as much as possible for the waterfalls reported by AppStat, using as many async APIs as possible and converting many methods to tasklets.

This attempt made me read the NDB Async documentation , which mentions that NDB implements an autobutcher that combines multiple requests in a single RPC call to the datastore. I understand this applies to requests using different keys, but also applies to redundant calls to the same object?

In other words, my question is: can this code pattern be replaced by this one?

class FooAsync(ndb.Model):
    prop_a = ndb.StringProperty()
    prop_b = ndb.StringProperty()
    prop_c = ndb.StringProperty()

    @ndb.tasklet
    def some_method_1(self):
        self.prop_a = "The result of some computation"
        yield self.put_async()

    @ndb.tasklet
    def some_method_2(self):
        if some_condition:
            self.prop_b = "Some new value"
            yield self.put_async()

    @ndb.tasklet
    def some_method_3(self):
        if some_condition:
            self.prop_b = "Some new value"
            yield self.put_async()
        elif some_other_condition:
            self.prop_b = "Some new value"
            self.prop_c = "Some new value"
            yield self.put_async()

@ndb.tasklet
def manipulate_foo(f):
    yield f.some_method_1()
    yield f.some_method_2()
    yield f.some_method_3()

      

Will the entire call be put_async()

merged into one call put

per entity? If so, are there any caveats for using this approach rather than manually checking for the updated return value and calling put

once at the end of the call sequence?

+3


source to share


2 answers


Well, I bit the bullet and tested these 3 scripts in a GAE test app with AppStat enabled to see what RPC calls were being made:

class Foo(ndb.Model):
    prop_a = ndb.DateTimeProperty()
    prop_b = ndb.StringProperty()
    prop_c = ndb.IntegerProperty()

class ThreePutsHandler(webapp2.RequestHandler):
    def post(self):
        foo = Foo.get_or_insert('singleton')
        foo.prop_a = datetime.utcnow()
        foo.put()
        foo.prop_b = str(foo.prop_a)
        foo.put()
        foo.prop_c = foo.prop_a.microsecond
        foo.put()

class ThreePutsAsyncHandler(webapp2.RequestHandler):
    @ndb.toplevel
    def post(self):
        foo = Foo.get_or_insert('singleton')
        foo.prop_a = datetime.utcnow()
        foo.put_async()
        foo.prop_b = str(foo.prop_a)
        foo.put_async()
        foo.prop_c = foo.prop_a.microsecond
        foo.put_async()

class ThreePutsTaskletHandler(webapp2.RequestHandler):
    @ndb.tasklet
    def update_a(self, foo):
        foo.prop_a = datetime.utcnow()
        yield foo.put_async()

    @ndb.tasklet
    def update_b(self, foo):
        foo.prop_b = str(foo.prop_a)
        yield foo.put_async()

    @ndb.tasklet
    def update_c(self, foo):
        foo.prop_c = foo.prop_a.microsecond
        yield foo.put_async()

    @ndb.toplevel
    def post(self):
        foo = Foo.get_or_insert('singleton')
        self.update_a(foo)
        self.update_b(foo)
        self.update_c(foo)

app = webapp2.WSGIApplication([
    ('/ndb-batching/3-puts', ThreePutsHandler),
    ('/ndb-batching/3-puts-async', ThreePutsAsyncHandler),
    ('/ndb-batching/3-puts-tasklet', ThreePutsTaskletHandler),
], debug=True)

      

The first one ThreePutsHandler

obviously triggers the call Put

3 times.

ThreePutsHandler AppStat trace

However, the other two tests that call put_async()

end in one call Put

:



ThreePutsAsyncHandler AppStat traceThreePutsTaskletHandler AppStat trace

So the answer to my question is, yes, the redundant calls to ndb.Model.put_async () are loaded by the NDB autostart function and end up as one call datastore_v3.Put

. And it doesn't matter if these calls are made put_async()

inside the tasklet or not.

A note on the number of datastore writes seen in the test results: As Shay pointed out in the comments, for every indexed property value that is changed, 4 records are added plus 1 record for the object. Thus, in the first test (3 consecutive Put

) we observe (4 + 1) * 3 = 15 write ops. In the other two tests (async) we observe (4 * 3) + 1 = 13, we write ops.

So the bottom line is that when calling NDB with multiple calls put_async

to the same object, we save a lot of latency by having one call to the datastore and saving us multiple writes by writing the object only once.

+5


source


Try to annotate the object itself and check before returning the answer. Like the _p_changed attribute in Zope. Another alternative would be to query / threadlocal for a registry of modified objects, which must be written before returning. For an example threadlocal in GAE check google / appengine / runtime / request_environment.py



+1


source







All Articles