Fast tensor initialization in Julia
I would like to initialize a 3d tensor (multidimensional array) with "diagonal gaussian" values
exp(-32*(u^2 + 16*(v^2 + w^2)))
where u = 1/sqrt(3)*(x+y+z)
and v,w
are any two coordinates orthogonal to k u
, discretized on a uniform grid on [-1,1]^3
. The following code achieves this:
function gaussian3d(n)
q = qr(ones(3,1), thin=false)[1]
x = linspace(-1.,1., n)
p = Array(Float64,(n,n,n))
square(x) = x*x
Base.@nloops 3 i p begin
@inbounds p[i_1,i_2,i_3] =
exp(
-32*(
square(q[1,1]*x[i_1] + q[2,1]*x[i_2] + q[3,1]*x[i_3])
+ 16*(
square(q[1,2]*x[i_1] + q[2,2]*x[i_2] + q[3,2]*x[i_3]) +
square(q[1,3]*x[i_1] + q[2,3]*x[i_2] + q[3,3]*x[i_3])
)
)
)
end
return p
end
However, it seems to be quite slow. For example, if I replace the defining function with exp(x*y*z)
, the code runs 50 times faster. Also, the macro @time
reports ~ 20% GC time for the above code, which I don't understand where they are from. (These numerical values were obtained using n = 128
.) My questions are therefore
- How can I speed up this piece of code?
- Where is the memory allocation hidden that causes the GC overhead?
source to share
Not knowing anything about 3D tensors with "diagonal gauss" values, using a comment square
from the original post, "typing" q
( @code_warntype
helps here : big leap in performance!), And further specializing in @nloops
, this works much faster on platforms I've tried it on.
julia> square(x::Float64) = x * x
square (generic function with 1 method)
julia> function my_gaussian3d(n)
q::Array{Float64,2} = qr(ones(3,1), thin=false)[1]
x = linspace(-1.,1., n)
p = Array(Float64,(n,n,n))
Base.@nloops 3 i p d->x_d=x[i_d] begin
@inbounds p[i_1,i_2,i_3] =
exp(
-32*(
square(q[1,1]*x_1 + q[2,1]*x_2 + q[3,1]*x_3)
+ 16*(
square(q[1,2]*x_1 + q[2,2]*x_2 + q[3,2]*x_3) +
square(q[1,3]*x_1 + q[2,3]*x_2 + q[3,3]*x_3)
)
)
)
end
return p
end
my_gaussian3d (generic function with 1 method)
julia> @time gaussian3d(128);
elapsed time: 3.952389337 seconds (1264 MB allocated, 4.50% gc time in 57 pauses with 0 full sweep)
julia> @time gaussian3d(128);
elapsed time: 3.527316699 seconds (1264 MB allocated, 4.42% gc time in 58 pauses with 0 full sweep)
julia> @time my_gaussian3d(128);
elapsed time: 0.285837566 seconds (16 MB allocated)
julia> @time my_gaussian3d(128);
elapsed time: 0.28476448 seconds (16 MB allocated, 1.22% gc time in 0 pauses with 0 full sweep)
julia> my_gaussian3d(128) == gaussian3d(128)
true
source to share