I'm using python, librosa, np. The audio is stereo, 44.1k, about 2gb, 3hrs First I convert the audio to mon, then in 60 second chunks I calculate the maximum power. Next I use then highest max power as the global max power reference to generate the spectrograms. I use the same f_high, and f_low same db_high and db_low same n_fft, hope langth and n_mels. Same sample rate, and same max power reference.
basically this to get global max power ref
for i in tqdm(range(0, len(y), samples_per_chunk)):
y_chunk = y[i : i + samples_per_chunk]
S = librosa.feature.melspectrogram(
y=y_chunk,
sr=sr,
n_fft=n_fft,
hop_length=hop_length,
n_mels=n_mels,
fmin=f_low,
fmax=f_high,
)
maxpower = np.max(S)
if maxpower > global_max_power:
global_max_power = maxpower
then I use that max power to create the images convert to db S_dB = librosa.power_to_db( S, ref=max_power, amin=10 ** (db_low / 10.0), top_db=db_high - db_low, )
render image
plt.figure(figsize=(image_width / 100, video.get("height", 100) / 100))
librosa.display.specshow(
S_dB,
sr=sr,
cmap=audiovis.get("cmap", "magma"),
hop_length=hop_length,
fmin=f_low,
fmax=f_high,
)
if I make the image very wide, so no chunking is required (using a smaller file), I do not see the color intensity variation in the audio. when I process the audio in chunks, it appears, yet the parameters controlling the process are the same.
I'm not sure how to diagnose further.