Why does my device_map="auto" in transformers.pipline uses CPU only even though GPUs are available?

509 Views Asked by At

I came across this problem when trying out LLaMa 2 (13B version) on a 8X32GB-GPU server. The pipeline setting is like below:

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=512,
)

And when I run the script, the nvidia-smi is like below and it never changes, which indicates that the GPUs are ignored:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB           Off | 00000000:06:00.0 Off |                    0 |
| N/A   35C    P0              45W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-32GB           Off | 00000000:07:00.0 Off |                    0 |
| N/A   35C    P0              43W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-32GB           Off | 00000000:0A:00.0 Off |                    0 |
| N/A   36C    P0              45W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-32GB           Off | 00000000:0B:00.0 Off |                    0 |
| N/A   34C    P0              41W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2-32GB           Off | 00000000:85:00.0 Off |                    0 |
| N/A   34C    P0              46W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  Tesla V100-SXM2-32GB           Off | 00000000:86:00.0 Off |                    0 |
| N/A   35C    P0              44W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  Tesla V100-SXM2-32GB           Off | 00000000:89:00.0 Off |                    0 |
| N/A   37C    P0              43W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  Tesla V100-SXM2-32GB           Off | 00000000:8A:00.0 Off |                    0 |
| N/A   34C    P0              44W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

How can I set the pipeline to work with multiple GPUs instead of the CPU?

Many thanks.

I tried to specify the exact cuda core for use with the argument device="cuda:0" in transformers.pipeline, and this did enforced the pipeline to use cuda:0 instead of the CPU.

But, LLaMA-2-13b requires more memory than 32GB to run on a single GPU, which is exact the memory of my Tesla V100. So I guess I need to find a way to allocate the workload to mutiple GPUs in order to make LLaMA-2-13b to run.

All solutions I found by googling around tell me that device_map="auto" can automatically allocate the model to different GPUs, which I found is not the case with my work envioronment, as is stated above.

0

There are 0 best solutions below