Possible to reconstruct audio only with spectrogram image?

828 Views Asked by At

So I'm creating some spectrograms with librosa to be saved as images, after which I intend to make modifications to the image directly (ie. add random noise, etc), then I would like to reconstruct the audio from that image.

Anyway, some research led me to examples of similar processes (see here or here) but nothing quite like I'm trying to do, which is take a png/jpg image of a spectrogram and convert it back to an usable audio file.

Here's the full code I'm using to generate the spec images:

import librosa
from librosa import display
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas

filename = librosa.util.example_audio_file()
y, sr = librosa.load(filename)
window_size = 1024
window = np.hanning(window_size)
stft = librosa.core.spectrum.stft(y, n_fft=window_size, hop_length=512, window=window)
out = 2 * np.abs(stft) / np.sum(window)

fig = plt.Figure()
canvas = FigureCanvas(fig)
ax = fig.add_subplot(111)
fig.subplots_adjust(left=0,right=1,bottom=0,top=1)
ax.axis('tight')
ax.axis('off')

p = librosa.display.specshow(librosa.amplitude_to_db(out, ref=np.max), ax=ax, y_axis='log', x_axis='time')
fig.savefig('spectrogram.png')

Which would produce this exact image: spectrogram.png

But functions like librosa.istft or librosa.griffinlim expect the output of librosa.core.spectrum.stft, and I haven't been able to reverse that entire process coming from just the image file. Assuming I had this picture, is there any way to build the audio back again (even if it's lossy)? What kind of other information would be necessary, and how could I do it?

Thanks in advance.

0

There are 0 best solutions below