Julia: vectorized or detected code
From what I understood, Julia should be doing for loops faster and faster than vectorized operations. I wrote three versions of a simple function that finds the distance used for loops versus a vectorized operation versus the latter using DataFrames:
x = rand(500)
y = rand(500)
a = rand()
b = rand()
function devect()
dist = Array(Float64, 0)
twins = Array(Float64, 0,2)
for i in 1:500
dist = [dist; sqrt((x[i] - a)^2 + (y[i] - b)^2)]
if dist[end] < 0.05
twins = [twins; [x y][end,:]]
end
end
return twins
end
function vect()
d = sqrt((x-a).^2 + (y-b).^2)
return [x y][d .< 0.05,:]
end
using DataFrames
function df_vect()
df = DataFrame(x=x, y=y)
dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2)
return df[dist .< 0.05,:]
end
n = 10^3
@time for i in [1:n] devect() end
@time for i in [1:n] vect() end
@time for i in [1:n] df_vect() end
Output:
elapsed time: 4.308049576 seconds (1977455752 bytes allocated, 24.77% gc time)
elapsed time: 0.046759167 seconds (37295768 bytes allocated, 54.36% gc time)
elapsed time: 0.052463997 seconds (30359752 bytes allocated, 49.44% gc time)
Why is the vector version so much faster?
source to share
Following my comment on the method used to build a solution in devect. Here's my code
julia> x, y, a, b = rand(500), rand(500), rand(), rand()
julia> function devect{T}(x::Vector{T}, y::Vector{T}, a::T, b::T)
res = Array(T, 0)
dim1 = 0
for i = 1:size(x,1)
if sqrt((x[i]-a)^2+(y[i]-b)^2) < 0.05
push!(res, x[i])
push!(res, y[i])
dim1 += 1
end
end
reshape(res, (2, dim1))'
end
devect (generic function with 1 method)
julia> function vect{T}(x::Vector{T}, y::Vector{T}, a::T, b::T)
d = sqrt((x-a).^2+(y-b).^2)
[x y][d.<0.05, :]
end
vect (generic function with 1 method)
julia> @time vect(x, y, a, b)
elapsed time: 3.7118e-5 seconds (37216 bytes allocated)
2x2 Array{Float64,2}:
0.978099 0.0405639
0.94757 0.0224974
julia> @time vect(x, y, a, b)
elapsed time: 7.1977e-5 seconds (37216 bytes allocated)
2x2 Array{Float64,2}:
0.978099 0.0405639
0.94757 0.0224974
julia> @time devect(x, y, a, b)
elapsed time: 1.7146e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
0.978099 0.0405639
0.94757 0.0224974
julia> @time devect(x, y, a, b)
elapsed time: 1.3065e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
0.978099 0.0405639
0.94757 0.0224974
julia> @time devect(x, y, a, b)
elapsed time: 1.8059e-5 seconds (376 bytes allocated)
2x2 Array{Float64,2}:
0.978099 0.0405639
0.94757 0.0224974
There might be faster devect solutions, but note the difference in the allocated bytes. If the detected solution allocates more memory than the vectorized solution, it is probably wrong (at least in Julia).
source to share
http://julia.readthedocs.org/en/latest/manual/performance-tips/#avoid-global-variables
Your code uses volatile globals everywhere, which means that you basically fall back into the execution domain of interpreted languages, since no guarantees about their type can be enforced at compile time. For a quick speedup, just assign all global variable assignments const
.
source to share
You detected code is not very efficient.
I made the following changes:
- Made all global variables constant
- I preallocated the output vector rather than adding every time
-
I am showing two different ways in which you could detect the output in an easier way.
const x = rand(500) const y = rand(500) const a = rand() const b = rand() function devect() dist = Array(Float64, 500) for i in 1:500 dist[i] = sqrt((x[i] - a)^2 + (y[i] - b)^2) end return [x y][dist .< 0.05,:] end function devect2() pairs = Array(Float64, 500, 2) for i in 1:500 dist = sqrt((x[i] - a)^2 + (y[i] - b)^2) if dist < 0.05 pairs[i,:] = [x[i], y[i]] end end return pairs end function vect() d = sqrt((x-a).^2 + (y-b).^2) return [x y][d .< 0.05,:] end using DataFrames function df_vect() df = DataFrame(x=x, y=y) dist = sqrt((df[:x]-a).^2 + (df[:y]-b).^2) return df[dist .< 0.05,:] end const n = 10^3 @time for i in [1:n] devect() end @time for i in [1:n] devect2() end @time for i in [1:n] vect() end @time for i in [1:n] df_vect() end
Output signal
elapsed time: 0.009283872 seconds (16760064 bytes allocated)
elapsed time: 0.003116157 seconds (8456064 bytes allocated)
elapsed time: 0.050070483 seconds (37248064 bytes allocated, 44.50% gc time)
elapsed time: 0.0566218 seconds (30432064 bytes allocated, 40.35% gc time)
source to share