How to calculate the phases after a pitch shift on the STFT?

214 Views Asked by At

I use torch.stft() to generate a spectrogram.
I want to perform a pitch shift on the audio.
The end result should be an STFT of the pitch-shifted audio.

I can't phase_vocoder -> istft -> resample -> stft because that's too slow.

Instead, I wrote some code that makes a new spectrogram, where the nth frequency bin is just the n * scaling_factorth frequency bin of the original spectrogram, interpolated for fractional indices:

def interpolate(frequencies: torch.Tensor, sgram: torch.Tensor):
    start =
    frac = (frequencies - start)[:, None]
    return sgram[start, :] * (1 - frac) + sgram[start + 1, :] * frac

def pitch_shift_spectrogram(sgram: torch.Tensor, semitones: torch.Tensor):
    scaling_factor = 2 ** (-semitones / 12)
    frequencies = torch.arange(0, sgram.shape[0], 1, device=sgram.device)
    shifted_frequencies = frequencies * scaling_factor
    shifted_mags = interpolate(shifted_frequencies, sgram.abs())
    phases = sgram.angle() # ??? what do i do? help
    return torch.polar(shifted_mags, phases)

I tested it on this sound file.

The spectrogram (magnitude only) that this generates is good enough; stealing the phases from a complete, working pitch shift implementation makes it sound fine:

import torch
import torchaudio
import IPython.display as display

waveform, sample_rate = torchaudio.load("CantinaBand3.wav")
waveform = waveform[0]

shifted_waveform = torchaudio.functional.pitch_shift(waveform, sample_rate, 2)
working_sgram = torch.stft(shifted_waveform, 1024, return_complex=True)

unshifted_sgram = torch.stft(waveform, 1024, return_complex=True)
broken_sgram = pitch_shift_spectrogram(unshifted_sgram, torch.tensor(2))

broken_sgram = torch.polar(broken_sgram.abs(), working_sgram.angle())
display.display(display.Audio(torch.istft(broken_sgram, 1024), rate=sample_rate))

Is it possible to calculate the phase information of the pitch-shifted signal through the STFT alone? Does PyTorch offer any built-in functions that do this for me?
Also, one of the model parameters affects the semitones parameter in the pitch shift function. With the interpolation, will autograd backprop through that fine?


There are 0 best solutions below