I am trying to fine tune a Llama 2 7B model using QLORA with multiple GPUs in Databricks while following along with this example. While I am using my own dataset, I think my problems begin with adding special tokens.
model_path = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
tokenizer.add_special_tokens({'eos_token': '</s>', 'bos_token': '<s>', 'pad_token': '<pad>', 'sep_token': '<|body|>'})
This is what my config code looks like. Unfortunately we are already starting to diverge from the example quite a bit now, but I believe I do need to specify these special tokens. Just from previous experience, I find that the model performs a lot better when I've got these tokens explicity defined.
config = LlamaConfig(model_name,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
sep_token_id=tokenizer.sep_token_id,
output_hidden_states=False)
All the code up to this point "works". But then I try to instantiate the model, and I get a very strange error. I have tried two different ways to create the model.
1.
model = LlamaForCausalLM.from_pretrained(
model_name,
config=config
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
config=config,
trust_remote_code=True
)
Both of these ways return the same error:
TypeError: '<' not supported between instances of 'int' and 'str'
Here are the last three blocks of the traceback:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-569a19c6-ee18-4d2c-b8fa-74dc24547bca/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:732, in LlamaForCausalLM.__init__(self, config)
730 def __init__(self, config):
731 super().__init__(config)
--> 732 self.model = LlamaModel(config)
733 self.pretraining_tp = config.pretraining_tp
734 self.vocab_size = config.vocab_size
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-569a19c6-ee18-4d2c-b8fa-74dc24547bca/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:560, in LlamaModel.__init__(self, config)
557 self.padding_idx = config.pad_token_id
558 self.vocab_size = config.vocab_size
--> 560 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
561 self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
562 self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
File /databricks/python/lib/python3.10/site-packages/torch/nn/modules/sparse.py:133, in Embedding.__init__(self, num_embeddings, embedding_dim, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse, _weight, _freeze, device, dtype)
131 if padding_idx is not None:
132 if padding_idx > 0:
--> 133 assert padding_idx < self.num_embeddings, 'Padding_idx must be within num_embeddings'
134 elif padding_idx < 0:
135 assert padding_idx >= -self.num_embeddings, 'Padding_idx must be within num_embeddings'
Somehow either the padding_idx or the self.num_embeddings got changed into a string, and I'm really not sure how or why. I get the feeling that it's just not currently possible to use special tokens with Llama 2, but if anyone has figured out how to do it let me know. I still need to try finetuning it without the special tokens defined and see how the final performance matches up, but who knows maybe I'll get the same error.