I have a small acoustic dataset of human sounds which I would like to augment and later pass to a binary classifier.
I am familiar with data augmentation for images, but how is it done for acoustic datasets?
I've found 2 related answers regarding autoencoders and SpecAugment with Pytorch & TorchAudio but I would like to hear your thoughts about the audio-specific "best method".
It really depends on what are you trying to achieve, what your classifier is designed for and how it works.
Depending on the above, you can for example cut the audio differently (if you are feeding the classifier with cut audio segments, and that makes sense in your particular case). You can also augment it with some background noise (artificial like white noise, or recorded one) with different signal to noise ratio - this should additionally make the classifier more robust against noise.