Julia distributes function: specifying distributed dimension
I am interested in allocating an MxN integer array to p-workers. Is there a way to determine which dimension is being allocated? Specifically, I want to keep the number of rows M fixed and spread across N columns. In my case M> N (I have a matrix of terms-documents with a dictionary of size M and number of documents N).
By default Julia seems to be distributed at the largest size, which doesn't work for my application (I want to distribute documents, not a dictionary). Is there a way to control which dimension is allocated?
source to share
SharedArray
has an optional parameter pids
that maps items to processes (see documentation ).
So, the MxN matrix can be initialized with the following code:
# a helper function which might be useful in other contexts
function balancedfill(v,n,b)
d,r = divrem(n,b)
return v[[repeat(1:r,inner=d+1);repeat(r+1:b,inner=d)]]
end
# N,M = size(mat)
pidvec = repeat(balancedfill(1:nprocs(),N,nprocs()),inner=M)
sharedmat = SharedArray{Float64}((N,M); pids=pidvec)
This creates a generic Float64 array with columns balanced across processes. Float64 can be replaced with the required element type. With a little change (when switching inner
from outer
and N
from M
to pidvec
), a distributed array across rows can be created.
source to share