How do I use a binary mask and STFT to produce an audio file?

247 Views Asked by At

So here's the idea: you can generate a spectrogram from an audio file using shorttime Fourier transform (stft). Then some people have generated something called a "binary mask" to generate different audio (ie. with background noise removed etc.) from the inverse stft.

Here's what I understand:

  1. stft is a simple equation that is applied to the audio file, which generates the information that can easily be displayed a spectrogram.
  2. By taking the inverse of the stft matrix, and multiplying it by a matrix of the same size (the binary matrix) you can create a new matrix with information to generate an audio file with the masked sound.

Once I do the matrix multiplication, how is the new audio file created?

It's not much but here's what I've got in terms of code:

from librosa import load
from librosa.core import stft, istft
y, sample_rate = load('1.wav')
spectrum = stft(y)
back_y = istft(spectrum)

Thank you, and here are some slides that got me this far. I'd appreciate it if you could give me an example/demo in python

0

There are 0 best solutions below