Do redundant calls to ndb.Model.put_async () only need to be sent once to the datastore?
I have an NDB model that provides several instance methods to manage its state. In some request handlers, I need to call several of these instance methods. To prevent calling put()
more than once in the same object, the pattern I've used so far is similar to this one:
class Foo(ndb.Model):
prop_a = ndb.StringProperty()
prop_b = ndb.StringProperty()
prop_c = ndb.StringProperty()
def some_method_1(self):
self.prop_a = "The result of some computation"
return True
def some_method_2(self):
if some_condition:
self.prop_b = "Some new value"
return True
return False
def some_method_3(self):
if some_condition:
self.prop_b = "Some new value"
return True
if some_other_condition:
self.prop_b = "Some new value"
self.prop_c = "Some new value"
return True
return False
def manipulate_foo(f):
updated = False
updated = f.some_method_1() or updated
updated = f.some_method_2() or updated
updated = f.some_method_3() or updated
if updated:
f.put()
Basically, every method that can potentially update an object returns a bool to indicate if the object has been updated and therefore should be saved. When calling these methods in sequence, I will make sure to call put()
if any of the methods returned True
.
However, this pattern can be difficult to implement in situations where other routines are involved. In this case, I need to bubble up the updated boolean returned from the subroutines to top level methods.
I am currently optimizing many request handlers trying to limit as much as possible for the waterfalls reported by AppStat, using as many async APIs as possible and converting many methods to tasklets.
This attempt made me read the NDB Async documentation , which mentions that NDB implements an autobutcher that combines multiple requests in a single RPC call to the datastore. I understand this applies to requests using different keys, but also applies to redundant calls to the same object?
In other words, my question is: can this code pattern be replaced by this one?
class FooAsync(ndb.Model):
prop_a = ndb.StringProperty()
prop_b = ndb.StringProperty()
prop_c = ndb.StringProperty()
@ndb.tasklet
def some_method_1(self):
self.prop_a = "The result of some computation"
yield self.put_async()
@ndb.tasklet
def some_method_2(self):
if some_condition:
self.prop_b = "Some new value"
yield self.put_async()
@ndb.tasklet
def some_method_3(self):
if some_condition:
self.prop_b = "Some new value"
yield self.put_async()
elif some_other_condition:
self.prop_b = "Some new value"
self.prop_c = "Some new value"
yield self.put_async()
@ndb.tasklet
def manipulate_foo(f):
yield f.some_method_1()
yield f.some_method_2()
yield f.some_method_3()
Will the entire call be put_async()
merged into one call put
per entity? If so, are there any caveats for using this approach rather than manually checking for the updated return value and calling put
once at the end of the call sequence?
source to share
Well, I bit the bullet and tested these 3 scripts in a GAE test app with AppStat enabled to see what RPC calls were being made:
class Foo(ndb.Model):
prop_a = ndb.DateTimeProperty()
prop_b = ndb.StringProperty()
prop_c = ndb.IntegerProperty()
class ThreePutsHandler(webapp2.RequestHandler):
def post(self):
foo = Foo.get_or_insert('singleton')
foo.prop_a = datetime.utcnow()
foo.put()
foo.prop_b = str(foo.prop_a)
foo.put()
foo.prop_c = foo.prop_a.microsecond
foo.put()
class ThreePutsAsyncHandler(webapp2.RequestHandler):
@ndb.toplevel
def post(self):
foo = Foo.get_or_insert('singleton')
foo.prop_a = datetime.utcnow()
foo.put_async()
foo.prop_b = str(foo.prop_a)
foo.put_async()
foo.prop_c = foo.prop_a.microsecond
foo.put_async()
class ThreePutsTaskletHandler(webapp2.RequestHandler):
@ndb.tasklet
def update_a(self, foo):
foo.prop_a = datetime.utcnow()
yield foo.put_async()
@ndb.tasklet
def update_b(self, foo):
foo.prop_b = str(foo.prop_a)
yield foo.put_async()
@ndb.tasklet
def update_c(self, foo):
foo.prop_c = foo.prop_a.microsecond
yield foo.put_async()
@ndb.toplevel
def post(self):
foo = Foo.get_or_insert('singleton')
self.update_a(foo)
self.update_b(foo)
self.update_c(foo)
app = webapp2.WSGIApplication([
('/ndb-batching/3-puts', ThreePutsHandler),
('/ndb-batching/3-puts-async', ThreePutsAsyncHandler),
('/ndb-batching/3-puts-tasklet', ThreePutsTaskletHandler),
], debug=True)
The first one ThreePutsHandler
obviously triggers the call Put
3 times.
However, the other two tests that call put_async()
end in one call Put
:
So the answer to my question is, yes, the redundant calls to ndb.Model.put_async () are loaded by the NDB autostart function and end up as one call datastore_v3.Put
. And it doesn't matter if these calls are made put_async()
inside the tasklet or not.
A note on the number of datastore writes seen in the test results: As Shay pointed out in the comments, for every indexed property value that is changed, 4 records are added plus 1 record for the object. Thus, in the first test (3 consecutive Put
) we observe (4 + 1) * 3 = 15 write ops. In the other two tests (async) we observe (4 * 3) + 1 = 13, we write ops.
So the bottom line is that when calling NDB with multiple calls put_async
to the same object, we save a lot of latency by having one call to the datastore and saving us multiple writes by writing the object only once.
source to share
Try to annotate the object itself and check before returning the answer. Like the _p_changed attribute in Zope. Another alternative would be to query / threadlocal for a registry of modified objects, which must be written before returning. For an example threadlocal in GAE check google / appengine / runtime / request_environment.py
source to share