How to improve Whisper speech to text

30 Views Asked by At

Although Whisper’s transcription is highly accurate, there is always jargon (GPT) or non-standard spellings that make the transcript flawed (example: “Dave Prior” is a podcast host and transcription will spell his last name as “Pryor.”) What are some ways to improve transcription?

1

There are 1 best solutions below

0
Lance Kind On

There are three usual ways to improve Whisper transcription service:

  1. Prompt Whisper (up to 244 tokens) with a word list. [1]
  2. Post process the transcripts with a GPT that is promoted to revise the transcript and supplied with a word list (up to the GPT’s token limit)[2]
  3. Fine tune the model to better understand your accent and domain by training it on an audio file recorded with a word list. [3]

I suggest the above order is in increasing difficulty. If Whisper is having trouble with your accent or how you say acronyms, then fine tuning will be the best solution. The first two options are nice as one could build he prompts dynamically.