I am doing classification on project type of audio processing. The original sampling rate is about 32000 Hz I use normalization and resembling and new sampling rate is 16000 Hz and make uniform size chunks 4 sec, so each have 4*16000 = 64000 sampling points. Frame size is 512 and hop is 128 (4 time overlap) and I got total 500 frames form each sample. Shape of audio data is like (#samples, #frames, #samples_in_frame) = (x, 500, 512). Using the Librosa Library I extract Mels Spectrograms where n_mel = 100, now the shape is like (#samples, #frames, #n_mel, #t_bins) = (x, 500, 128, 5)
Here ismy code "Scaled_Mel = scale_minmax(Mel_reshaped) # Iterate over each Mel spectrogram and save it as an image for i, file_name in enumerate(sliced_sample_names4): mel_spectrogram = Scaled_Mel[i, :, :] # Extract the scaled Mel spectrogram spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)
# Transpose the spectrogram to match the expected shape
spectrogram = np.transpose(spectrogram) # Transpose the spectrogram
# Create a new figure and axis with the desired dimensions
plt.figure(figsize=(5, 5)) # Set the figure size to 500x500 pixels
# off the axis
plt.axis('off')
# Display the spectrogram as an image
plt.imshow(spectrogram, aspect='auto', cmap='viridis')
# Construct the filename using the current file name and index
filename = f"spectrogram_{file_name}_{i+1}.png"
filepath = os.path.join(msp_path, filename)
# Check if the file already exists
if os.path.exists(filepath):
print(f"Image {filename} already exists in the folder path. Skipping...")
continue
# Save the plot as an image
plt.savefig(filepath, bbox_inches='tight', pad_inches=0, dpi=500)
# Close the plot to free memory
plt.close()
print(f"Spectrogram {i+1} saved as {filepath}")"
Question 1: wht t_bins = 5 is it becaue of (frame_size/hop_size + 1)? Question 2: I make calculation like if i reshape it, it would be like Mel_reshaped = Mel.reshape(Mel.shape[0], Mel.shape[1], -1) = (x, 500, 100*5) = (x, 500, 500) is that strategy make sence? Question 3: I want to use vision transformer model and all my calculation are for make the shape like 500 by 500 but when i save the spectrograms they give me dimentions like (1937,1935). I know dpi impact alot but if I select dpi to short its disturm the resoulation.