How do I use a binary mask and STFT to produce an audio file?

247 Views Asked by Blake H At 01 August 2018 at 22:15

So here's the idea: you can generate a spectrogram from an audio file using shorttime Fourier transform (stft). Then some people have generated something called a "binary mask" to generate different audio (ie. with background noise removed etc.) from the inverse stft.

Here's what I understand:

stft is a simple equation that is applied to the audio file, which generates the information that can easily be displayed a spectrogram.
By taking the inverse of the stft matrix, and multiplying it by a matrix of the same size (the binary matrix) you can create a new matrix with information to generate an audio file with the masked sound.

Once I do the matrix multiplication, how is the new audio file created?

It's not much but here's what I've got in terms of code:

from librosa import load
from librosa.core import stft, istft
y, sample_rate = load('1.wav')
spectrum = stft(y)
back_y = istft(spectrum)

Thank you, and here are some slides that got me this far. I'd appreciate it if you could give me an example/demo in python

Original Q&A

How do I use a binary mask and STFT to produce an audio file?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in AUDIO

Related Questions in MASK

Related Questions in SPECTROGRAM

Trending Questions

Popular # Hahtags

Popular Questions