Can't run fine-tuning for llama 7b with LORA (OOM)

45 Views Asked by CyBer CyBer At 19 March 2024 at 23:26

I am trying to fine-tune llama 7b using Lora + deepseed, I hit OOM all the time (I have rtx 3090 24 GB + 32GB RAM + 80G SWAP). it looks like it doesn't offload it to CPU/RAM. I tried different parameters with diff values, but it looks it is just don't offload it and try to run it on GPU only. I spent couple of days trying to run it, pls help

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 23.47 GiB of which 100.38 MiB is free. Including non-PyTorch memory, this process has 21.66 GiB memory in use. Of the allocated memory 21.40 GiB is allocated by PyTorch, and 11.11 MiB is reserved by PyTorch but unallocated.

def load_model_and_tokenizer(model_name, project_tag):
    model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir="./llama")
    tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./llama")
....

loraConfig = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    # target_modules=lora_target_modules,
    bias="none",
    task_type="CAUSAL_LM",
)

project_tag = "<projectX>"
model_name = "meta-llama/Llama-2-7b-chat-hf"
model, tokenizer = load_model_and_tokenizer(model_name, project_tag)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, loraConfig)
file_path = "encodedProjectData.txt"
unlabeled_texts = read_data_from_file(file_path)
unlabeled_dataset = CustomUnlabeledDataset(unlabeled_texts, tokenizer)

training_arguments=TrainingArguments(
        auto_find_batch_size=True,
        optim="adafactor", gradient_checkpointing=True,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=60,
        learning_rate=2e-4,
        evaluation_strategy = 'steps',
        eval_accumulation_steps = 1,
        eval_steps = 10,
        seed =  42,
        # report_to="wandb",
        fp16=True,
        logging_steps=1,
        output_dir='outputs'
    )

# Wrap the model with DeepSpeed
model, optimizer, _, _ = deepspeed.initialize(model=model, args=training_arguments, config=deepspeed_config)

trainer = Trainer(
    model=model,
    train_dataset=unlabeled_dataset,
    args=training_arguments,

)
trainer.train()

deepspeed_config = {
    "fp16": {
        "enabled": False
    },
    "bf16": {
        "enabled": False
    },
    "zero_optimization": {
        "stage": 3,
        "offload_param": {
            "device": "cpu",
            "pin_memory": True
        },
        "overlap_comm": True,
        "contiguous_gradients": True,
        "reduce_bucket_size": "auto"
    },
    "steps_per_print": 2000,
    "train_batch_size": 1,
    "train_micro_batch_size_per_gpu": 1,
    "wall_clock_breakdown": False
}

I tried different TrainingArguments and different setups to run fine tunning

Original Q&A

Can't run fine-tuning for llama 7b with LORA (OOM)

There are 0 best solutions below

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in LLAMA

Related Questions in HUGGINGFACE-TRAINER

Trending Questions

Popular # Hahtags

Popular Questions