Memory allocation in a fixed point algorithm

203 Views Asked by At

I need to find the fixed point of a function f. The algorithm is very simple:

  1. Given X, compute f(X)
  2. If ||X-f(X)|| is lower than a certain tolerance, exit and return X, otherwise set X equal to f(X) and go back to 1.

I'd like to be sure I'm not allocating memory for a new object at every iteration

For now, the algorithm looks like this:

iter1 = function(x::Vector{Float64})
    for iter in 1:max_it
        oldx = copy(x)
        g1(x)
        delta = vnormdiff(x, oldx, 2)
        if delta < tolerance
            break
        end
    end
end

Here g1(x) is a function that sets x to f(x)

But it seems this loop allocates a new vector at every loop (see below).

Another way to write the algorithm is the following:

iter2 = function(x::Vector{Float64})
    oldx = similar(x)
    for iter in 1:max_it
        (oldx, x) = (x, oldx)
        g2(x, oldx)
        delta = vnormdiff(oldx, x, 2)
        if delta < tolerance
            break
        end
    end
end

where g2(x1, x2) is a function that sets x1 to f(x2).

Is thi the most efficient and natural way to write this kind of iteration problem?

Edit1: timing shows that the second code is faster:

using NumericExtensions
max_it = 1000
tolerance = 1e-8
max_it = 100

g1 = function(x::Vector{Float64}) 
    for i in 1:length(x)
        x[i] = x[i]/2
    end
end

g2 = function(newx::Vector{Float64}, x::Vector{Float64}) 
    for i in 1:length(x)
        newx[i] = x[i]/2
    end
end

x = fill(1e7, int(1e7))
@time iter1(x)
# elapsed time: 4.688103075 seconds (4960117840 bytes allocated, 29.72% gc time)
x = fill(1e7, int(1e7))
@time iter2(x)
# elapsed time: 2.187916177 seconds (80199676 bytes allocated, 0.74% gc time)

Edit2: using copy!

iter3 = function(x::Vector{Float64})
    oldx = similar(x)
    for iter in 1:max_it
        copy!(oldx, x)
        g1(x)
        delta = vnormdiff(x, oldx, 2)
        if delta < tolerance
            break
        end
    end
end
x = fill(1e7, int(1e7))
@time iter3(x)
# elapsed time: 2.745350176 seconds (80008088 bytes allocated, 1.11% gc time)
1

There are 1 best solutions below

4
On BEST ANSWER

I think replacing the following lines in the first code

for iter = 1:max_it
    oldx = copy( x )
    ...

by

oldx = zeros( N )
for iter = 1:max_it
    oldx[:] = x    # or copy!( oldx, x )
    ...

will be more efficient because no array is allocated. Also, the code can be made more efficient by writing for-loops explicitly. This can be seen, for example, from the following comparison

function test()
    N = 1000000

    a = zeros( N )
    b = zeros( N )

    @time c = copy( a )

    @time b[:] = a

    @time copy!( b, a )

    @time \
    for i = 1:length(a)
        b[i] = a[i]
    end

    @time \
    for i in eachindex(a)
        b[i] = a[i]
    end
end

test()

The result obtained with Julia0.4.0 on Linux(x86_64) is

elapsed time: 0.003955609 seconds (7 MB allocated)
elapsed time: 0.001279142 seconds (0 bytes allocated)
elapsed time: 0.000836167 seconds (0 bytes allocated)
elapsed time: 1.19e-7 seconds (0 bytes allocated)
elapsed time: 1.28e-7 seconds (0 bytes allocated)

It seems that copy!() is faster than using [:] in the left-hand side, though the difference becomes marginal in repeated calculations (there seems to be some overhead for the first [:] calculation). Btw, the last example using eachindex() is very convenient for looping over multi-dimensional arrays.

Similar comparison can be made for vnormdiff(), where use of norm( x - oldx ) etc is slower than an explicit loop for vector norm, because the former allocates one temporary array for x - oldx.