Running SQuAD script using ALBERT (huggingface-transformers)

Question

Running SQuAD script using ALBERT (huggingface-transformers)

786 Views Asked by Cobollero At 28 July 2025 at 04:14

I have a question regarding the usage of ALBERT with the SQuAD 2.0 huggingface-transformers script.

In the github page, there are no specific instructions in how to run the script using ALBERT, so I used the same specifications used to run the script with BERT. However, the final results achieved are (exact_match = 30.632527583593028, f1 = 36.36948708435092), far from the (f1 = 88.52, exact_match = 81.22) that are achieved by BERT and that are reported on the github page. So I think that I may be doing something wrong.

This is the code that I ran in the command line:

python run_squad.py \
   --model_type albert \
   --model_name_or_path albert-base-v2 \
   --do_train   --do_eval \
   --train_file train-v2.0.json \
   --predict_file dev-v2.0.json \
   --per_gpu_train_batch_size 5 \
   --learning_rate 3e-5 \
   --num_train_epochs 2.0 \
   --max_seq_length 384 \
   --doc_stride 128 \
   --output_dir /aneves/teste2/output/

The only difference between this one and the one from the transformers page is the model_name, in which they use 'bert_base_uncased', and the per_gpu_train_batch_size which is 12 but I had to use 5 due to memory constrains in my GPU.

Am I forgetting some option when I run the script or are the results achieved because of the per_gpu_train_batch_size being set to 5 instead of 12?

Thanks!

Original Q&A

There are 2 best solutions below

NRJ_Varshney On 20 April 2020 at 06:39

You can use gradient accumulation steps to compensate for the small batch size. Essentially, the gradient accumulation step parameter is this:

Let's say you want a batch_size of 64, but your GPU can only fit a batch of size 32.

So you make two passes of 32 batches each, accumulate your gradients, and then do the backward pass after 2 batches.

Secondly, hyperparameters play a humongous role in deep learning models. You will have to try a few sets of parameters to get better accuracy. I think reducing the learning rate to the order of e-6 might help here. Though it is just speculation.

**David Chiang** · Accepted Answer

David Chiang On 08 May 2020 at 00:51 BEST ANSWER

Did you set the flag

--version_2_with_negative

to True? Since SQUAD-2.0 contains some questions that do not have an answer, you need to set it to True.

Running SQuAD script using ALBERT (huggingface-transformers)

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in DEEP-LEARNING

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in SQUAD

Trending Questions

Popular # Hahtags

Popular Questions