I customized my wav2vec2 model for several classification tasks and now want to add a speaker embedding to get the fingerprint of a voice in form of a numpy array. I found some examples doing a speaker recognition - where known voice ids are recognized. But I'd prefer the fingerprint.
Can anybody give me a hint on how to start here?