I am trying to run Whisper in a Docker container on my M1 MacBook Air. When I run it, it gives a segfault. Any ideas how to debug?
The Dockerfile is pretty simple. Relevant excerpt:
FROM ubuntu:22.04
# Update base image
RUN apt-get update && \
apt-get upgrade -y && \
apt-get autoremove -y
# Set up Python and Whisper
RUN apt-get install -y \
jq \
git \
curl \
gnupg \
ffmpeg \
findutils \
python3 \
python3-pip
RUN pip3 install git+https://github.com/openai/whisper.git
Whisper is installed as recommended in the repo readme:
pip install git+https://github.com/openai/whisper.git
I have a WAV file that says "Hello world" that I am testing the transcription with in each environment.
- When I run Whisper on my Mac directly, outside of Docker, it runs fine:
>> time whisper --task transcribe --output_format json --model tiny hello_world.wav
/opt/homebrew/Cellar/openai-whisper/20231106/libexec/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.840] Hello world.
whisper --task transcribe --output_format json --model tiny hello_world.wav 7.49s user 0.80s system 297% cpu 2.780 total
- When I run in the Docker file, it segfaults:
# time whisper --task transcribe --output_format json --model tiny hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Segmentation fault
real 0m2.233s
user 0m2.507s
sys 0m0.746s
- If I cross-build the Docker image for linux/amd64 arch and run with Rosetta, it works but runs ridiculously slowly (7.5s up to 5m 41s):
Build command:
docker buildx build \
--platform=linux/amd64 \
-t whisper \
-f ./Dockerfile .
# time whisper --task transcribe --output_format json --model tiny hello_world.wav
/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:00.840] Hello world.
real 5m40.946s
user 5m40.920s
sys 0m1.897s