I am trying to run Whisper in a Docker container on my M1 MacBook Air. When I run it, it gives a segfault. Any ideas how to debug?
The Dockerfile is pretty simple. Relevant excerpt:
FROM ubuntu:22.04
# Update base image
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get autoremove -y 
# Set up Python and Whisper
RUN apt-get install -y \
    jq \
    git \ 
    curl \ 
    gnupg \ 
    ffmpeg \ 
    findutils \
    python3 \
    python3-pip 
RUN pip3 install git+https://github.com/openai/whisper.git
Whisper is installed as recommended in the repo readme:
pip install git+https://github.com/openai/whisper.git 
I have a WAV file that says "Hello world" that I am testing the transcription with in each environment.
- When I run Whisper on my Mac directly, outside of Docker, it runs fine:
 
>> time whisper --task transcribe --output_format json --model tiny hello_world.wav
/opt/homebrew/Cellar/openai-whisper/20231106/libexec/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.840]  Hello world.
whisper --task transcribe --output_format json --model tiny hello_world.wav  7.49s user 0.80s system 297% cpu 2.780 total
- When I run in the Docker file, it segfaults:
 
# time whisper --task transcribe --output_format json --model tiny hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Segmentation fault
real    0m2.233s
user    0m2.507s
sys 0m0.746s
- If I cross-build the Docker image for linux/amd64 arch and run with Rosetta, it works but runs ridiculously slowly (7.5s up to 5m 41s):
 
Build command:
    docker buildx build \
        --platform=linux/amd64 \
        -t whisper \
        -f ./Dockerfile .
# time whisper --task transcribe --output_format json --model tiny hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.840]  Hello world.
real    5m40.946s
user    5m40.920s
sys 0m1.897s