RNN fail with CUDA but runs on CPU

128 Views Asked by At

Problem with RNN and CUDA.

I want to run a RNN (https://fluxml.ai/Flux.jl/stable/models/recurrence/) on the GPU, using the explicit (https://fluxml.ai/Flux.jl/stable/training/training/#Implicit-or-Explicit?) gradients.

This seems to work fine one CPU, but fails on the GPU with the message pointing to the last statement in the loss function, called from gradient:

ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\libcuda.jl:27
 [2] isdone
   @ C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:171
 [5] device_synchronize()
   @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\synchronization.jl:169
 [6] top-level scope
   @ C:\Users\XXX\.julia\packages\CUDA\nIZkq\src\initialization.jl:210

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA C:\Users\XXX\.julia\packages\CUDA\nIZkq\lib\cudadrv\libcuda.jl:27

Since I am dealing with a RNN, the history must be available for the gradient (https://en.wikipedia.org/wiki/Backpropagation_through_time). The recurrence documentation for Flux specifies that the input should be structured as a vector (over time steps) of vectors (over features).

Code below is simplified to expose the problem, the training loop is stripped away.

using Flux
using ChainRulesCore
using CUDA

dev=gpu # cpu is working fine

m = Chain(RNN(2 => 5), Dense(5 => 1)) |> dev

x = [rand(Float32, 2) for i = 1:3] |> dev;
y = [rand(Float32, 1) for i=1:1] |> dev

[m(xi) for xi in x]

using Flux.Losses: mse

function loss(m, x, y)
    @ignore_derivatives Flux.reset!(m)
    m(x[1]) # ignores the output but updates the hidden states
    m(x[2]) # ignore second output
    mse(m(x[3]),y[1])
end
  
loss(m, x, y)

grads = Flux.gradient(m, x, y) do m,x,y
    loss(m, x, y)
end

optim = Flux.setup(Flux.Adam(), m) 
Flux.update!(optim, m, grads[1])

I am wondering if RNN is fully supported on CUDA in the new versions using explicit style?

Versions (in clean environment):

Julia 1.9.3
CUDA v5.1.0
ChainRulesCore v1.18.0
Flux v0.14.6
0

There are 0 best solutions below