Speech to Text audio formats

Question

Speech to Text audio formats

1.4k Views Asked by Shreshtha Garg At 13 April 2017 at 07:12

Can we use MP3 audio file in speech to text Watson API ?

What are the popular unsupported formats for speech to text Watson API ?

Original Q&A

There are 3 best solutions below

Shane K On 26 April 2017 at 18:06

No MP3 support: Watson Speech to Text audio formats

Jessica Miller On 08 October 2017 at 06:11

Don't struggle with choosing particular audio format for speech to text conversion, most of the manual speech to text or transcription services accepts all available formats. When we go for automatic speech to text service, i always prefer wav over mp3, since it contains high bit audio data without losing the quality of the audio and accepting by most speech engines. And here are the list of formats supported by any Transcription Company: https://www.transcriptionwave.com/format.html

**Sayuri Mizuguchi** · Accepted Answer · 2017-04-26T19:27:44.393000

I suggest you use WAV format, in the case: popular format. Depends the case use.

If you really need to use MP3, you can simple to convert MP3 to WAV.

But, the formats Speech to Text support is:

audio/flac: Free Lossless Audio Codec (FLAC), a lossless compressed audio coding format. For more information, see en.wikipedia.org/wiki/FLAC.
audio/l16: Linear 16-bit Pulse-Code Modulation (PCM), an uncompressed audio data format. Use this media type to pass a raw PCM file. Note that linear PCM audio can also reside inside a container Waveform Audio File Format (WAV) file. For more information, see the Internet Engineering Task Force (IETF) Request for Comment (RFC) 2586 and en.wikipedia.org/wiki/Pulse-code_modulation.
audio/wav: Waveform Audio File Format (WAV), a standard created by Microsoft® and IBM. A WAV file is a container that is often used for uncompressed audio bitstreams but can contain compressed audio, as well. For more information, see en.wikipedia.org/wiki/WAV. The service supports WAV files that use any encoding. It accepts audio with a maximum of nine channels (due to an FFmpeg limitation).
audio/ogg/ audio/ogg;codecs=opus / audio/ogg; codecs=vorbis: Ogg is a free, open container format maintained by the Xiph.org Foundation; for more information, see www.xiph.org/ogg/. Both codecs are free, open, lossy audio-compression formats. Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio.
audio/webm/ audio/webm;codecs=opus/ audio/webm;codecs=vorbis: Web Media (WebM) is an open media-file format; for more information, see webmproject.org. WebM supports audio streams compressed with the Opus and Vorbis audio codecs; Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio. For JavaScript code that shows how to capture audio from a microphone in a Chrome browser and encode it into a WebM data stream.

But, all formats with more details you can see in the Speech to Text Official Documentation. I suggest you to edit with more details and read the documentation, commonly, the documentation from IBM is very objective and complete.

Speech to Text audio formats

There are 3 best solutions below

Related Questions in MP3

Related Questions in IBM-WATSON

Related Questions in WATSON

Related Questions in FORMATS

Trending Questions

Popular # Hahtags

Popular Questions