OpenAI Whisper segfaulting when running inside Docker container on M1 Mac

266 Views Asked by At

I am trying to run Whisper in a Docker container on my M1 MacBook Air. When I run it, it gives a segfault. Any ideas how to debug?

The Dockerfile is pretty simple. Relevant excerpt:

FROM ubuntu:22.04

# Update base image
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get autoremove -y 

# Set up Python and Whisper
RUN apt-get install -y \
    jq \
    git \ 
    curl \ 
    gnupg \ 
    ffmpeg \ 
    findutils \
    python3 \
    python3-pip 

RUN pip3 install git+https://github.com/openai/whisper.git

Whisper is installed as recommended in the repo readme:

pip install git+https://github.com/openai/whisper.git 

I have a WAV file that says "Hello world" that I am testing the transcription with in each environment.

  1. When I run Whisper on my Mac directly, outside of Docker, it runs fine:
>> time whisper --task transcribe --output_format json --model tiny hello_world.wav
/opt/homebrew/Cellar/openai-whisper/20231106/libexec/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.840]  Hello world.
whisper --task transcribe --output_format json --model tiny hello_world.wav  7.49s user 0.80s system 297% cpu 2.780 total
  1. When I run in the Docker file, it segfaults:
# time whisper --task transcribe --output_format json --model tiny hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Segmentation fault

real    0m2.233s
user    0m2.507s
sys 0m0.746s
  1. If I cross-build the Docker image for linux/amd64 arch and run with Rosetta, it works but runs ridiculously slowly (7.5s up to 5m 41s):

Build command:

    docker buildx build \
        --platform=linux/amd64 \
        -t whisper \
        -f ./Dockerfile .
# time whisper --task transcribe --output_format json --model tiny hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.840]  Hello world.

real    5m40.946s
user    5m40.920s
sys 0m1.897s
0

There are 0 best solutions below