OpenAI Gradient Checkpointing with Tensorflow Eager Execution

1.6k Views Asked by At

I have recently switched to Tensorflow Eager (currently working with TF 1.8.0) and like it a lot. However, I now have quite a large model which does not fit into my GPU Memory (GTX 1080Ti, 12GB VRAM) when run with the Gradient Tape which is needed to calculate the gradients in TF. The forward pass (i.e. without using Gradient Tape) works fine.

I thought about using the Gradient Checkpointing from OpenAI with the hope that this would help. However, simply using it as described in their Git does not seem to help within Eager Execution, i.e.

import tensorflow as tf
import tensorflow.contrib.eager as tfe
import memory_saving_gradients
tf.__dict__["gradients"] = memory_saving_gradients.gradients_memory
# using gradients_memory or gradients_speed does not change anything
# tf.__dict__["gradients"] = memory_saving_gradients.gradients_speed

[...]
with tfe.GradientTape() as g:
    output = run_large_model()
    loss = calculate_loss_on_output(output)
grads = g.gradient(full, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))

runs out of memory, independent of using gradient checkpointing or not.

My guess is that the gradient tape still stores all variables and the required information for the backward pass and the gradient checkpointing has no effect because TF in Eager mode doesn't actually construct a graph (from what I understand - or at least it's a different graph).

Do you have any experience or any idea how this could be solved or what I need to do to use the gradient checkpointing also in TF Eager mode?

1

There are 1 best solutions below

4
On

The gradient checkpointing code from openai is based on graph rewriting, so it does not support eager execution.

The tensorflow.contrib.layers library has a recompute_grad decorator which is equivalent but is supported in both graph and eager execution.