Audio resampling layer for tensorflow

1k Views Asked by At

It is required to resample audio signals within a custom model structure. This resampling task is not a kind of pre/post-processing operation that can be developed out of the model. In other words, this resampling is a section of model's internal design. Then, it is required to define the gradient operation for such a layer as well. For the resampling operation, it is going to employ tensorflow I/O:

tfio.audio.resample

The operation works perfectly and can be easily used as a pre/post-processing unit; however, its implementation a a custom layer being embedding within the model is challenging as I don't know how to implement the backward path.

  • How the backward path should be implemented for such a 1D signal resampling layer?
  • Is there any other open source 1D signal resampling layer that be employed?

P.S., I tried to employ conventional upsampling/pooling like layers, but not accurate enough comparing the tfio which implements other resampling methods like FFT-based.

To give more understanding, please have a look at: another question

1

There are 1 best solutions below

1
On

You must tell the objective of re-samplings, it can be done in many ways including concluding sing signals then you can represent with smaller sizes of sine values.

By changing of the samplig rate you can save the DATA space 0.05 * tf.math.sin(audio[:5 * 22050]).numpy()

sec_1 = np.zeros((2750)) * tf.math.sin(audio[0:2750]).numpy() and

sec_2 = np.ones((2750)) * tf.math.sin(audio[2750:5500]).numpy()

[ Sample ]:

import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt

contents = tf.io.read_file("F:\\temp\\Python\\Speech\\temple_of_love-sisters_of_mercy.wav")
audio, sample_rate = tf.audio.decode_wav(
    contents, desired_channels=-1, desired_samples=-1, name=None
)

print(audio)
print(sample_rate)

plt.plot(audio[:5 * 22050])
plt.show()
plt.close()

plt.plot(0.05 * tf.math.sin(audio[:5 * 22050]).numpy())
plt.show()
plt.close()

sec_1 = np.zeros((2750)) * tf.math.sin(audio[0:2750]).numpy()
sec_2 = np.ones((2750)) * tf.math.sin(audio[2750:5500]).numpy()


plt.plot(0.05 * tf.concat([sec_1, sec_2], 0).numpy())
plt.show()
plt.close()

[ Output ]:

array([[0.],
       [0.],
       [0.],
       ...,
       [0.],
       [0.],
       [0.]], dtype=float32)>, sample_rate=<tf.Tensor: shape=(), dtype=int32, numpy=22050>)

tf.Tensor(22050, shape=(), dtype=int32)

Sample