This link provides code for an autocorrelation-based pitch detection algorithm. I am using it to detect pitches in simple guitar melodies.
In general, it produces very good results. For example, for the melody C4, C#4, D4, D#4, E4 it outputs:
262.743653536
272.144441273
290.826273006
310.431336809
327.094621169
Which correlates to the correct notes.
However, in some cases like this audio file (E4, F4, F#4, G4, G#4, A4, A#4, B4) it produces errors:
325.861452246
13381.6439242
367.518651703
391.479384923
414.604661221
218.345286173
466.503751322
244.994090035
More specifically, there are three errors here: 13381Hz is wrongly detected instead of F4 (~350Hz) (weird error), and also 218Hz instead of A4 (440Hz) and 244Hz instead of B4 (~493Hz), which are octave errors.
I assume the two errors are caused by something different? Here is the code:
slices = segment_signal(y, sr)
for segment in slices:
pitch = freq_from_autocorr(segment, sr)
print pitch
def segment_signal(y, sr, onset_frames=None, offset=0.1):
if (onset_frames == None):
onset_frames = remove_dense_onsets(librosa.onset.onset_detect(y=y, sr=sr))
offset_samples = int(librosa.time_to_samples(offset, sr))
print onset_frames
slices = np.array([y[i : i + offset_samples] for i
in librosa.frames_to_samples(onset_frames)])
return slices
You can see the freq_from_autocorr
function in the first link above.
The only think that I have changed is this line:
corr = corr[len(corr)/2:]
Which I have replaced with:
corr = corr[int(len(corr)/2):]
UPDATE:
I noticed the smallest the offset
I use (the smallest the signal segment I use to detect each pitch), the more high-frequency (10000+ Hz) errors I get.
Specifically, I noticed that the part that goes differently in those cases (10000+ Hz) is the calculation of the i_peak
value. When in cases with no error it is in the range of 50-150, in the case of the error it is 3-5.
The autocorrelation function in the code snippet that you linked is not particularly robust. In order to get the correct result, it needs to locate the first peak on the left hand side of the autocorrelation curve. The method that the other developer used (calling the
numpy.argmax()
function) does not always find the correct value.I've implemented a slightly more robust version, using the peakutils package. I don't promise that it's perfectly robust either, but in any case it achieves a better result than the version of the
freq_from_autocorr()
function that you were previously using.My example solution is listed below:
The printed output looks like:
The first figure produced by my code sample depicts the amplitude curves for the next 0.1 seconds following each detected onset time:
The second figure produced by the code shows the autocorrelation curves, as computed inside of the
freq_from_autocorr()
function. The vertical red lines depict the location of the first peak on the left for each curve, as estimated by the peakutils package. The method used by the other developer was getting incorrect results for some of these red lines; that's why his version of that function was occasionally returning the wrong frequencies.My suggestion would be to test the revised version of the
freq_from_autocorr()
function on other recordings, see if you can find more challenging examples where even the improved version still gives incorrect results, and then get creative and try to develop an even more robust peak finding algorithm that never, ever mis-fires.