I use torch.stft()
to generate a spectrogram.
I want to perform a pitch shift on the audio.
The end result should be an STFT of the pitch-shifted audio.
I can't phase_vocoder
-> istft
-> resample
-> stft
because that's too slow.
Instead, I wrote some code that makes a new spectrogram, where the n
th frequency bin is just the n * scaling_factor
th frequency bin of the original spectrogram, interpolated for fractional indices:
def interpolate(frequencies: torch.Tensor, sgram: torch.Tensor):
start = frequencies.int()
frac = (frequencies - start)[:, None]
return sgram[start, :] * (1 - frac) + sgram[start + 1, :] * frac
def pitch_shift_spectrogram(sgram: torch.Tensor, semitones: torch.Tensor):
scaling_factor = 2 ** (-semitones / 12)
frequencies = torch.arange(0, sgram.shape[0], 1, device=sgram.device)
shifted_frequencies = frequencies * scaling_factor
shifted_mags = interpolate(shifted_frequencies, sgram.abs())
phases = sgram.angle() # ??? what do i do? help
return torch.polar(shifted_mags, phases)
I tested it on this sound file.
The spectrogram (magnitude only) that this generates is good enough; stealing the phases from a complete, working pitch shift implementation makes it sound fine:
import torch
import torchaudio
import IPython.display as display
waveform, sample_rate = torchaudio.load("CantinaBand3.wav")
waveform = waveform[0]
shifted_waveform = torchaudio.functional.pitch_shift(waveform, sample_rate, 2)
working_sgram = torch.stft(shifted_waveform, 1024, return_complex=True)
unshifted_sgram = torch.stft(waveform, 1024, return_complex=True)
broken_sgram = pitch_shift_spectrogram(unshifted_sgram, torch.tensor(2))
broken_sgram = torch.polar(broken_sgram.abs(), working_sgram.angle())
display.display(display.Audio(torch.istft(broken_sgram, 1024), rate=sample_rate))
Is it possible to calculate the phase information of the pitch-shifted signal through the STFT alone? Does PyTorch offer any built-in functions that do this for me?
Also, one of the model parameters affects the semitones
parameter in the pitch shift function. With the interpolation, will autograd backprop through that fine?