'MPTConfig' object has no attribute 'hidden_size'

181 Views Asked by At

I am attempting to finetune an MPT model with DeepSpeed on Databricks, but I am running into this AttributeError. Here is a MRE of my code below:

import transformers
from transformers import AutoConfig

model_path = 'mosaicml/mpt-7b'
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model_hidden_size = config.hidden_size

AttributeError: 'MPTConfig' object has no attribute 'hidden_size'

I need this model_hidden_size variable so I can use it in this code:

deepspeed_config["hidden_size"] = model_hidden_size
deepspeed_config["zero_optimization"]["reduce_bucket_size"] = model_hidden_size*model_hidden_size
deepspeed_config["zero_optimization"]["stage3_prefetch_bucket_size"] = 0.9 * model_hidden_size * model_hidden_size
deepspeed_config["zero_optimization"]["stage3_param_persistence_threshold"] = 10 * model_hidden_size

Do I need to open a feature request on the MPT github? Should I try and use model foundry instead of Huggingface Transformers? Or is this deepspeed_config code unnecessary for the actual finetuning process? I am using Zero stage 3.

0

There are 0 best solutions below