I'm trying to run the transformer model, which is listed on huggingface. My code is identical to link below https://medium.com/@pazuzzu/in-depth-llm-fine-tuning-guide-efficiently-fine-tune-and-use-zephyr-7b-beta-assistant-using-lora-e23d8151e067
Whenever I tried to run above code in my local gpu server, Error message such as
Traceback (most recent call last):
File "Main.py", line 163, in <module>
trainer.train()
File "/home/jhkcool97/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1561, in train
return inner_training_loop(
File "/home/jhkcool97/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1895, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/jhkcool97/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2830, in training_step
self.accelerator.backward(loss)
File "/home/jhkcool97/.local/lib/python3.8/site-packages/accelerate/accelerator.py", line 1964, in backward
self.scaler.scale(loss).backward(**kwargs)
File "/home/jhkcool97/.local/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/home/jhkcool97/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/jhkcool97/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 289, in apply
return user_fn(self, *args)
File "/home/jhkcool97/.local/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 275, in backward
tensors = ctx.saved_tensors
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [6, 1, 201, 201]] is at version 16; expected version 14 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!