Should we not expect wav2vec to outperform Microsoft STT

42 Views Asked by At

I compared the accuracy levels of both wav2vec and Microsoft STT on a few Ted talks.

Accuracy levels are word error rates

|Video |Wav2Vec|MicrosftSTT|
|:---: |:-----:|:---------:|
|[1][1]|8.57   |3.7        |
|[2][2]|13.83  |5.8        |
|[3][3]|20.7   |11.1       |
|[4][4]|12.5   |6.6        |

Microsoft beats Wav2vec by two times for every file. Isn't Wav2vec supposed to be state of the art? What am I missing here?

I used the 960hr big model provided in fairseq for generating the text

0

There are 0 best solutions below