How to calculate the phases after a pitch shift on the STFT?

214 Views Asked by At

I use torch.stft() to generate a spectrogram.
I want to perform a pitch shift on the audio.
The end result should be an STFT of the pitch-shifted audio.

I can't phase_vocoder -> istft -> resample -> stft because that's too slow.

Instead, I wrote some code that makes a new spectrogram, where the nth frequency bin is just the n * scaling_factorth frequency bin of the original spectrogram, interpolated for fractional indices:

def interpolate(frequencies: torch.Tensor, sgram: torch.Tensor):
    start = frequencies.int()
    frac = (frequencies - start)[:, None]
    return sgram[start, :] * (1 - frac) + sgram[start + 1, :] * frac

def pitch_shift_spectrogram(sgram: torch.Tensor, semitones: torch.Tensor):
    scaling_factor = 2 ** (-semitones / 12)
    frequencies = torch.arange(0, sgram.shape[0], 1, device=sgram.device)
    shifted_frequencies = frequencies * scaling_factor
    shifted_mags = interpolate(shifted_frequencies, sgram.abs())
    phases = sgram.angle() # ??? what do i do? help
    return torch.polar(shifted_mags, phases)

I tested it on this sound file.

The spectrogram (magnitude only) that this generates is good enough; stealing the phases from a complete, working pitch shift implementation makes it sound fine:

import torch
import torchaudio
import IPython.display as display

waveform, sample_rate = torchaudio.load("CantinaBand3.wav")
waveform = waveform[0]

shifted_waveform = torchaudio.functional.pitch_shift(waveform, sample_rate, 2)
working_sgram = torch.stft(shifted_waveform, 1024, return_complex=True)

unshifted_sgram = torch.stft(waveform, 1024, return_complex=True)
broken_sgram = pitch_shift_spectrogram(unshifted_sgram, torch.tensor(2))

broken_sgram = torch.polar(broken_sgram.abs(), working_sgram.angle())
display.display(display.Audio(torch.istft(broken_sgram, 1024), rate=sample_rate))

Is it possible to calculate the phase information of the pitch-shifted signal through the STFT alone? Does PyTorch offer any built-in functions that do this for me?
Also, one of the model parameters affects the semitones parameter in the pitch shift function. With the interpolation, will autograd backprop through that fine?

0

There are 0 best solutions below