I'm using the Google Cloud Speech API Python Library to extract text from a video file. In a prior step the video file is converted to a flac audiofile.
sample_rate = 48000
client = speech.Client()
cmd = "ffmpeg -i {} -vn -ac 1 -ar {} {}".format(mpg_file, sample_rate, flac_file)
subprocess.run(cmd)
with open(flac_file, 'rb') as f:
audio = client.sample(f.read(), sample_rate=sample_rate, encoding='FLAC')
audio.sync_recognize()
In order to reduce the time taken by the function sync_recognize(), I set sample_rate = 16000. My idea was that the communication with the Web-API and the processing of the audio file should be faster, because the file size is smaller, the amount of data to process is less and the information density is lower.
A repeated runtime measurement of this process with the same list of files for a sample rate of 16kHz and 48kHz yields:
16kHz: 26.16s per call
48kHz: 17.68s per call
I expected the opposite result. Is my thinking wrong? Do you have an explanation for this?