ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate`

2.1k Views Asked by At

I'm trying to fine-tune llama2-13b-chat-hf with an open source datasets.

I always used this template but now I'm getting this error:

ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes

I installed all the packages required and these are the versions:

    accelerate @ git+https://github.com/huggingface/accelerate.git@97d2168e5953fe7373a06c69c02c5a00a84d5344
    bitsandbytes==0.42.0
    datasets==2.17.1
    huggingface-hub==0.20.3
    peft==0.8.2
    tokenizers==0.13.3
    torch==2.1.0+cu118
    torchaudio==2.1.0+cu118
    torchvision==0.16.0+cu118
    transformers==4.30.0
    trl==0.7.11

Anyone know if that is a version issues? How did you fix that?

I tried to install other version but nothing.

2

There are 2 best solutions below

0
Yaoming Xuan On

Have you tried accelerate test in your cmd terminal? If your installation is successful, this command should output a list of messages and a "test successful" in the end. If this command fails, it means that there is something wrong with your pytorch + accelerate environment. You should reinstall them following the official tutorials. If the command succeeds and you still can't do multi-GPU finetuning, you should report this issue in bitsandbytes' github repo.

Here are some other potential causes.

  • Your cuda version is too old. Most tools are built on 12.0 + nowadays. Yous should update cuda with this link

  • python version should be 3.10 +, otherwise you won't be able to install the latest tools with pip

  • Why do you want to train a quantized model? Quantization is made to shrink the model for deployment instead of training. This tool is not designed for your purpose. If you finetune your model with quantized parameters, then gradients won't have any impact, because they are simply too small to represent with only 8 bits. If you want to finetune a LLM with limited GPU memory, you should try lora or SFT. Both of them can freeze some layers to reduce VRAM usage.

0
Utopion On
  1. Go to https://pytorch.org/
  2. Select your config
  3. on your env, run the given code

For example, I choosed a stable / windows / python /CUDA 11.8, the website gave me this:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

(thanks to Niraj Pahari, I just wanted to make it a official answer and not only a commentary)