Why does duration of WAV file affect the values returned by Scipy find_peaks?

54 Views Asked by At

I'm writing some very simple code to do frequency analysis on a WAV file (as recorded by a microphone into Audacity): Do an FFT on the file, and find the resulting peak frequency values. As far as I can tell, I've followed the library documentation correctly.

The WAV file is a frequency sweep from 1 to 5000Hz, back to 1Hz.

When applying Scipy find_peaks to the FFT data, the values it returns are multiplied by the length of the file (in seconds). See the following plots:

Plots of: 1: Time-based data. 2. FFT single-sided. 3. FFT with peaks (note different frequency range). 4. FFT with corrected peaks (peaks/secs).

The 4th plot shows that if I divide the peaks data by the file duration, the values are correct. I don't like this solution, as I can find no documentation to back it up. Does anyone know what I'm doing wrong here?

My code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import scipy.io.wavfile as wavfile
import scipy
import scipy.fftpack
import scipy.fft
import numpy as np
from scipy.signal import find_peaks as fp
from matplotlib import pyplot as plt

#read the wav file and print the sample rate
fs_rate, signal = wavfile.read("1Hz-5k_sweep_phone_single.wav")
print ("Sampling rate:", fs_rate)

#calculate number of channels of signal
l_audio = len(signal.shape)
print ("Channels:", l_audio)

#trim one channel, if channels = 2
if l_audio == 2:
    signal = signal.sum(axis=1) / 2
    
#print number of samples
N = signal.shape[0]
print ("Number of samples N:", N)

#calculate duration of file
secs = N / float(fs_rate)
print ("File duration (secs):", secs)

#calculate sample interval
Ts = 1.0/fs_rate # sampling interval in time
print ("Timestep between samples Ts:", Ts)


t = np.arange(0, secs, Ts) # time vector as scipy arange field / numpy.ndarray

#Do FFT
FFT = abs(scipy.fft.fft(signal))
#print("FFT size: ",FFT.shape[0])

#split the FFT to one-sided range
FFT_side = FFT[range(N//2)] # one side FFT range
print("FFT upper side: ", FFT_side)

#calculate the frequency bins.  Put into array
freqs = scipy.fft.fftfreq(signal.size, t[1]-t[0])
print("t:", t[1]-t[0])
fft_freqs = np.array(freqs)
#print("Frequency bins: ",fft_freqs)

#Split the frequencies to one-sided range
freqs_side = freqs[range(N//2)] # one side frequency range
fft_freqs_side = np.array(freqs_side)
print("size of upper-side fft: ",fft_freqs_side.shape[0])


#do peak detection on data.
h = (2.0e+04,2.0e+08)  #detected height range
p = None  #prominence
d = 100 #minimum distance between peaks (ignore closer peaks)
peaks, _ = fp(x=FFT_side, height=h,prominence=p,distance=d)
print("Peak detected frequencies (Hz):",peaks)
print("number of peaks: ", peaks.shape[0])



#data plotting
fig, axes = plt.subplots(nrows = 2, ncols = 1, figsize= (12,9))

ax1 = plt.subplot(411)
ax1.margins(0.05)           # Default margin is 0.05, value 0 means fit
ax1.plot(t, signal, "g",linewidth=0.1)
ax1.set_xlabel('Time')
ax1.set_ylabel('Amplitude')

ax2 = plt.subplot(412)
ax2.plot(freqs_side, abs(FFT_side), "b")
ax2.set_xlabel('Frequency (Hz)')
ax2.set_ylabel('FFT single-sided')
ax2.set_xlim(0,5500)
ax2.set_ylim(0,200000)

ax3 = plt.subplot(413)
ax3.plot(freqs_side, abs(FFT_side), "b")
ax3.plot(peaks, FFT_side[peaks],"x",color='tab:orange')
ax3.set_xlabel('Frequency (Hz)')
ax3.set_ylabel('FFT single-sided with peaks')
ax3.set_xlim(0,55000)
ax3.set_ylim(0,200000)

ax4 = plt.subplot(414)
ax4.plot(freqs_side, abs(FFT_side), "b")
ax4.plot(peaks/secs, FFT_side[peaks],"x",color='tab:orange')
ax4.set_xlabel('Frequency (Hz)')
ax4.set_ylabel('FFT single-sided: corrected peaks')
ax4.set_xlim(0,5500)
ax4.set_ylim(0,200000)

fig.tight_layout() # Or equivalently,  "plt.tight_layout()"

plt.show()

Multiple entries on stackoverflow suggested that find_peaks was a preferable way to approach finding FFT peaks. I've looked at the documentation for find_peaks, and verified that my code follows the examples there and also in various other examples that claim to work.

I also searched stackoverflow and google for related issues, but didn't find anything.

As documented above, I've 'solved' it by dividing the peak values by the duration of the WAV file - but I'm convinced that I'm doing something wrong elsewhere in my code.

2

There are 2 best solutions below

2
On BEST ANSWER

That is because peaks are not the frequencies, peaks are the indices. You have to use:

ax3 = plt.subplot(413)
ax3.plot(freqs_side, abs(FFT_side), "b")
ax3.plot(freqs_side[peaks], FFT_side[peaks],"x",color='tab:orange')
ax3.set_xlabel('Frequency (Hz)')
ax3.set_ylabel('FFT single-sided with peaks')
ax3.set_xlim(0,55000)
ax3.set_ylim(0,200000)

When you are dividing them by seconds, you are actually converting them back to Hz :D

0
On

Here to answer my own question:

find_peaks when applied to FFT data finds the peak values at the corresponding FFT bins, rather than the frequencies. It's relevant to normalise this to the frequency range.

Based on my specified bin sizes, the division by the file duration is correct.

I could have checked this using:

print(fft_freqs_side[51511])

which would have returned a value of 5004Hz.