How to calculate the phases after a pitch shift on the STFT?

211 Views Asked by potatoportato At 17 August 2025 at 17:53

I use torch.stft() to generate a spectrogram.
I want to perform a pitch shift on the audio.
The end result should be an STFT of the pitch-shifted audio.

I can't phase_vocoder -> istft -> resample -> stft because that's too slow.

Instead, I wrote some code that makes a new spectrogram, where the nth frequency bin is just the n * scaling_factorth frequency bin of the original spectrogram, interpolated for fractional indices:

def interpolate(frequencies: torch.Tensor, sgram: torch.Tensor):
    start = frequencies.int()
    frac = (frequencies - start)[:, None]
    return sgram[start, :] * (1 - frac) + sgram[start + 1, :] * frac

def pitch_shift_spectrogram(sgram: torch.Tensor, semitones: torch.Tensor):
    scaling_factor = 2 ** (-semitones / 12)
    frequencies = torch.arange(0, sgram.shape[0], 1, device=sgram.device)
    shifted_frequencies = frequencies * scaling_factor
    shifted_mags = interpolate(shifted_frequencies, sgram.abs())
    phases = sgram.angle() # ??? what do i do? help
    return torch.polar(shifted_mags, phases)

I tested it on this sound file.

The spectrogram (magnitude only) that this generates is good enough; stealing the phases from a complete, working pitch shift implementation makes it sound fine:

import torch
import torchaudio
import IPython.display as display

waveform, sample_rate = torchaudio.load("CantinaBand3.wav")
waveform = waveform[0]

shifted_waveform = torchaudio.functional.pitch_shift(waveform, sample_rate, 2)
working_sgram = torch.stft(shifted_waveform, 1024, return_complex=True)

unshifted_sgram = torch.stft(waveform, 1024, return_complex=True)
broken_sgram = pitch_shift_spectrogram(unshifted_sgram, torch.tensor(2))

broken_sgram = torch.polar(broken_sgram.abs(), working_sgram.angle())
display.display(display.Audio(torch.istft(broken_sgram, 1024), rate=sample_rate))

Is it possible to calculate the phase information of the pitch-shifted signal through the STFT alone? Does PyTorch offer any built-in functions that do this for me?
Also, one of the model parameters affects the semitones parameter in the pitch shift function. With the interpolation, will autograd backprop through that fine?

Original Q&A

How to calculate the phases after a pitch shift on the STFT?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in AUDIO

Related Questions in PYTORCH

Related Questions in SPECTROGRAM

Related Questions in PITCH-SHIFTING

Trending Questions

Popular # Hahtags

Popular Questions