OOM issues with AllenNLP coreference resolution training and substituting models

111 Views Asked by At

I have a few questions about training and evaluating AllenNLP's coreference resolution model.

  1. Are there any constraints/specifications on what GPUs should be used for training? I get an OOM issue midway through training on a Titan RTX GPU with 24220 MiB memory. Are there any parameters I can change that might help (note: I am using the BERT instead of the SpanBERT version)?

  2. I noticed that the model usage examples use an already trained and stored model. Can we instead specify a model path from a model we have trained?

  3. Can we substitute roberta-base with bert-base-uncased in the coref_bert-lstm.jsonnet file, or are other modifications necessary to make this change?

1

There are 1 best solutions below

0
On
  1. This model needs a lot of memory. The max_length parameter makes the biggest difference to memory usage. If you can get away with a max length that's shorter than 512, try that first.
  2. Yes, wherever it takes a URL to a trained model, you can substitute a local path to a model that you trained yourself.
  3. Yes, you can, but you'll have to train the model from scratch.