For my sound processing project (specifically pitch detection) I need to implement a cross correlation function and I'm having trouble with the results, I have 400 frames and all frames have 512 samples, the frames have 50 percent overlap this is the formula of the cross correlation I have tried so many ways to do it correct but i couldn't here is my last code:
import numpy as np
def pitch_detection(self, frame_matrix, frame_number, lag_vector, frequency):
np.seterr(divide='ignore', invalid='ignore')
pitch_freq_vector = []
for frame in range(frame_number):
ccf = []
frame_expand_1 = frame_matrix[frame-1, :]
frame_expand_2 = frame_matrix[frame-2, :]
temp_corr_1 = frame_matrix[frame, :]
temp_corr_2 = np.append(frame_expand_1[256:], temp_corr_1, axis=0)
temp_corr_2 = np.append(frame_expand_2[192:256], temp_corr_2, axis=0)
len_tc2 = len(temp_corr_2)
for lag in lag_vector: #pitch is the highest correlation in lag vector
ccf.append(np.sum(temp_corr_1*temp_corr_2[len_tc2-lag-512:len_tc2-lag]))
max_index, max_value = max(enumerate(ccf), key=operator.itemgetter(1))
if max(ccf) > 0.3*np.sum(np.power(temp_corr_1, 2)): #if more than 30 detect pitch
pitch_freq_vector.append(max_index)
else:
pitch_freq_vector.append(-1)
return pitch_freq_vector
The problem is the maximum always is in the last arg of the ccf
but it should vary in different frames.
note that pitch frequency for human varies between 50-400 the index of vector maps into those frequencies later and for frames that has no pitch -1
is appended to the list
The implementation works properly It should be used for a frame matrix input I hope you enjoy it