How to store working-local variables in dask / distributed
Using dask 0.15.0, distributed 1.17.1.
I want to notice some things per worker like a client to access google cloud storage because instantiation is expensive. I would rather keep this in some kind of working attribute. What is the canonical way to do this? Or are globals the way to go?
source to share
To work
You can access the local worker using the get_worker function . A slightly clearer thing than globals is to add state to the worker:
from dask.distributed import get_worker
def my_function(...):
worker = get_worker()
worker.my_personal_state = ...
future = client.submit(my_function, ...)
We should probably add a generic namespace variable for workers to serve as a common place for such information, but not yet.
Like globals
At the same time, however, for such things as connections to external services, global is not entirely evil. Many systems like Tornado use global singletons.
If you care about thread safety
Note that workers are often multithreaded. If your connection object is not thread safe, you may need to cache a different object for each thread. For this, I recommend using an object threading.local
. Dask uses one in
from distributed.worker import thread_state
source to share