How to store working-local variables in dask / distributed

Using dask 0.15.0, distributed 1.17.1.

I want to notice some things per worker like a client to access google cloud storage because instantiation is expensive. I would rather keep this in some kind of working attribute. What is the canonical way to do this? Or are globals the way to go?

+6


source to share


2 answers


To work

You can access the local worker using the get_worker function . A slightly clearer thing than globals is to add state to the worker:

from dask.distributed import get_worker

def my_function(...):
    worker = get_worker()
    worker.my_personal_state = ...

future = client.submit(my_function, ...)

      

We should probably add a generic namespace variable for workers to serve as a common place for such information, but not yet.

Like globals



At the same time, however, for such things as connections to external services, global is not entirely evil. Many systems like Tornado use global singletons.

If you care about thread safety

Note that workers are often multithreaded. If your connection object is not thread safe, you may need to cache a different object for each thread. For this, I recommend using an object threading.local

. Dask uses one in

from distributed.worker import thread_state

      

+5


source


Dask Actors

For simpler use cases, other solutions may be preferred; However, the actors are worth considering. Actors are currently an experimental feature in Dask that allows stateful computation.



Dask Actors

0


source







All Articles