Best practice for releasing memory after getting url on appengine (python)

Question

Best practice for releasing memory after getting url on appengine (python)

My problem is what is the best way to free memory to respond to asynchronous url requests to appengine. This is what I basically do in python:

rpcs = []

for event in event_list:
    url = 'http://someurl.com'
    rpc = urlfetch.create_rpc()
    rpc.callback = create_callback(rpc)
    urlfetch.make_fetch_call(rpc, url)
    rpcs.append(rpc)

for rpc in rpcs:
    rpc.wait()

In my test scenario, it does this for request 1500. But I need an architecture to handle a lot more in a short amount of time.

Then there is a callback function that adds the task to the queue to process the results:

def event_callback(rpc):
    result = rpc.get_result()
    data = json.loads(result.content)
    taskqueue.add(queue_name='name', url='url', params={'data': data})

My problem is that I make so many concurrent RPC calls that the memory of my instance crashes: "Exceeding the limited private memory limit from 159.234 MB after serving only 975 requests"

I've already tried three things:

del result
del data

and

result = None
data = None

and I executed the garbage collector manually after the callback function.

gc.collect()

But nothing seems to free memory immediately after the callback functions have added the task to the queue - and thus the instance crashes. Is there any other way to do this?

+3

python google-app-engine

Sebastian Küpers Jan 29. 13 at 12:21

source to share

2 answers

T. Steinrücken · Answer 1 · 2013-01-29T13:22:59+0000

Wrong approach: put these urls in (put) -queue, increase its speed to the desired value (defaut: 5 / sec) and let each task handle one url selection (or group). Note that there is a security limit of 3000 url-fetch-api-calls / minute (and one url-fetch can use more than one api-call)

tesdal · Answer 2 · 2013-01-29T13:23:34+0000

Also use the task queue for urlfetch, turn off the fan and avoid running out of memory, register named tasks, and specify an event_list cursor for the next task. You might want to fetch + process in a scenario like this instead of registering a new task for each process, especially if the process also includes data store records.

I also find ndb to make these async solutions more elegant.

Check out what Brett Slatkins has to say about scalable apps and possibly piping .

Best practice for releasing memory after getting url on appengine (python)

More articles: