Bus error on mixtral-instructv01-awq with Vllm

146 Views Asked by At

I am getting a bus error when trying to initialize "TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ" model from Huggingface,

        self.model = LLM(
        model="TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ",
        quantization="awq",
        dtype="auto",
        tensor_parallel_size=tensor_parallel_size,
    )

The error i get is:

ERROR 2024-02-16T21:46:33.751635551Z *** SIGBUS received at time=1708119993 on cpu 67 *** ERROR 2024-02-16T21:46:33.754929304Z PC: @ 0x7e9291287a37 (unknown) ncclShmOpen() ERROR 2024-02-16T21:46:33.755156517Z @ 0x7e9456f02520 3456 (unknown) ERROR 2024-02-16T21:46:33.756768941Z @ 0x74352d6c63636e2f (unknown) (unknown) ERROR 2024-02-16T21:46:33.756790876Z [2024-02-16 21:46:33,756 E 1 6357] logging.cc:361: *** SIGBUS received at time=1708119993 on cpu 67 *** ERROR 2024-02-16T21:46:33.756800651Z [2024-02-16 21:46:33,756 E 1 6357] logging.cc:361: PC: @ 0x7e9291287a37 (unknown) ncclShmOpen() ERROR 2024-02-16T21:46:33.758085489Z [2024-02-16 21:46:33,758 E 1 6357] logging.cc:361: @ 0x7e9456f02520 3456 (unknown) ERROR 2024-02-16T21:46:33.759702920Z [2024-02-16 21:46:33,759 E 1 6357] logging.cc:361: @ 0x74352d6c63636e2f (unknown) (unknown)

I have tried with 2/4/8 NVIDIA_L4 GPUs,

Dockerfile

FROM nvidia/cuda:12.1.1-runtime-ubuntu22.04 as builder
...
# install deps/poetry/etc..

# install project deps and other deps i don't need locally:

RUN poetry add vllm\
     accelerate\
     deepspeed\
     auto-gptq\
     optimum\
     peft\
     transformers\
     flax==0.8.0\
     torch==2.1.2\
     tensorflow\
     bitsandbytes\
     autoawq

Also, this log might be important to understand:

Initializing an LLM engine with config: model='TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ', tokenizer='TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, seed=0)" }

Thanks!

0

There are 0 best solutions below