I want to make an audio fingerprint, so i need to get a spectrogram peaks array. I've tried to find solution in the internet, but there's nothing.
Here is the spectagram example
import librosa, librosa.display
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
from FFT import FFT
def MEL_SPECTOGRAM(signal, sr, fileName):
ipd.Audio(signal, rate=sr)
# this is the number of samples in a window per fft
n_fft = 2048
# The amount of samples we are shifting after each fft
hop_length = 512
audio_stft = librosa.core.stft(signal, hop_length=hop_length, n_fft=n_fft)
spectrogram = np.abs(audio_stft)
log_spectro = librosa.amplitude_to_db(spectrogram)
librosa.util.normalize(log_spectro)
librosa.display.specshow(log_spectro, sr=sr, n_fft=n_fft, hop_length=hop_length, cmap='magma', win_length=n_fft)
plt.plot()
plt.show()
[mel-spectagram example] (https://i.stack.imgur.com/u0zKd.png)
The best solution i found was this video, but unfortunately, it was written on wolfram, so i can't use it
https://www.youtube.com/watch?v=oCHeGesfJe8&ab_channel=Wolfram
Peak finding in a 2d array is a common operation in computer-vision. So a good way to do this in Python is to lean on a computer vision like
scipy.ndimage.One of the best resources for explaining the landmark/constellation approach to audio fingerprinting (as used by Shazam et.c.) can be found in Fundamentals of Audio Processing: Chapter 7, Audio Identification notebook. It contains Python code for computing the constellation map, in the function
compute_constellation_map.Below is complete, runnable code based on the above resource. I have only made a few fixes for compatibility with modern librosa.
Here is also some plotting code to show the output. Again based on the above notebooks.
Running it should give an image such as this: