What is the best sample rate for Google Speech API? Any Google employee or expert to comment on?

894 Views Asked by At

So far I have tested on a very little audio file with 16 kHz and 48 kHz. I would love to conduct much bigger tests but it costs money as you know.

48 kHz sample rate provided better results. However, on documentation it says best is 16 kHz

So I am a bit confused

Here the 16 kHz and 48 kHz flac files I have used to test with Google Speech to Text API

16 kHz : https://drive.google.com/file/d/1MbiW3t86W68ZqENtDqD4XdNmEV7QZbZA/view?usp=sharing

48 kHz : https://drive.google.com/file/d/1aLN1ptMJBwuYc6FdAk6CxcK1Ex4jI3vh/view?usp=sharing

And here the produced transcripts

16 kHz

Hello, dear students.

 Welcome to the lecture 1 of introduction to programming course.

 In this course, you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important cause of your Carriage. Why is that because in this course you will you will learn how to do

 Programming haftar called how to compose a software. So this is your most important lesson among all of the courses you are going to take because this lesson will teach you how to program.

 okay, so if you want to be a good programmer a good software engineer you have to

 Perfect.

 This course you have to give your most attention to this.

48 kHz

Hello, dear students.

 Welcome to the lecture 1 of introduction to programming course.

 In this course, you will learn how to program you will learn the fundamentals of programming. You will learn how to be a software engineer. This course is the primary the most important course of your Carriage. Why is that because in this course you will you will learn how to do

 Programming how to code how to compose a software. So this is your most important lesson.

 Among all of the courses you are going to take because these lesson will teach you how to program.

 okay, so if you want to be a good programmer a good software engineer you have to

 Perfect.

 This course you have to give your most attention to this.

Original sample rate of the video is 48 kHz

So any expert or employee can comment on this?

These are the 16 kHz and 48 kHz commands I used with ffmpeg to compose the flac file

-af aformat=s16:16000:mono
-af aformat=s16:48000:mono
2

There are 2 best solutions below

0
On

16 kHz is just the recommended sample rate to be used for transcribing Speech-to-Text. 1

We recommend a sample rate of at least 16 kHz in the audio files that you use for transcription with Speech-to-Text. Sample rates found in audio files are typically 16 kHz, 32 kHz, 44.1 kHz, and 48 kHz. Because intelligibility is greatly affected by the frequency range, especially in the higher frequencies, a sample rate of less than 16 kHz results in an audio file that has little or no information above 8 kHz. This can prevent Speech-to-Text from correctly transcribing spoken audio. Speech intelligibility requires information throughout the 2 kHz to 4 kHz range, although the harmonics (multiples) of those frequencies in the higher range are also important for preserving speech intelligibility. Therefore, keeping the sample rate to a minimum of 16 kHz is a good practice.

0
On

16k is the MINIMUM. Downsampling loses data. So, if your original is 48k - it's best to keep it.