I know that T5 has K, Q and V vectors in each layer. It also has a feedforward network. I would like to freeze K, Q and V vectors and only train the feedforward layers on each layer of T5. I use Pytorch library. The model could be a wrapper for huggingface T5 model or a modified version of it. I know how to freeze all parameters using the following code:
tokenizer = AutoTokenizer.from_pretrained(underlying_model_name)
model = T5ForConditionalGeneration.from_pretrained(underlying_model_name)
for p in model.parameters():
p.requires_grad = False # freezing
Could you please guide me how can I do this?
This github project probably could be helpful but it's for Roberta and GPT, could I adapt it for T5?
I've adapted a solution based on this discussion from the Huggingface forums. Basically, you have to specify the names of the modules/pytorch layers that you want to freeze.
In your particular case of T5, I started by looking at the model summary:
This gives the following (abbreviated output):
with this, we can then generate a list of modules that we want to freeze. In particular, I decided to freeze the entire
T5LayerSelfAttention
block for the encoder (and, additionally, theT5LayerCrossAttention
for the decoder):And then simply freeze all the parameters in the respective modules:
You can verify that these are actually frozen in your model by running the following:
which should print quite a few
False
as well. If you really only want to freeze K, Q and V, you can adapt the above process to just sub-select the modules you want.