Speaker segmentation using Kaldi's x-vector approach

694 Views Asked by At

I'm using kaldi for asr and now I want to do speaker segmentation using Kaldi's x-vector approach. They are providing some example segmentation scripts at https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16/v2 .They also provide a basic pretrained model on LDC corpus at https://david-ryan-snyder.github.io/2017/10/04/model_sre16_v2.html

This pretrained model has following structure when unarchived:

enter image description here

I don't have access to LDC corpus and I want to know how to train a model on my own data, and then how to use that model to do actual segmentation ?

1

There are 1 best solutions below

0
On

I want to know how to train a model on my own data

There is voxceleb demo which uses public data, you can run it yourself.

You can also format your data in the proper data structure (create data/utt2spk and data/wav.scp files) and run with your data.

https://github.com/kaldi-asr/kaldi/tree/master/egs/voxceleb/v2

and then how to use that model to do actual segmentation ?

You start with the scripts from the demo, removing unused parts. That will give you basic segmentation demo. You can call this reduced demo to do the segmentation with system(2) call from your application or in a similar way.

Then if you need you can turn the scripts into corresponding C++ API calls and call the same procedure from C++ or from any scripting language.