I am using the SpeechBrain model to predict phoneme sequences based on the audio data. The output of the model is like this,
['sil', 'dh', 'ih', 'r', 'iy', 'z', 'ah', 'n', 'z', 'f', 'er', 'dh', 'ih', 's', 'sil', 'd', 'ay', 'v', 's', 'iy', 'm', 'sil', 'd', 'f', 'uw', 'l', 'ih', 'sh', 'sil', 'n', 'aw', 'sil']
Using the Phoneme model I would like to convert these phonemes to text in the real world? The output would should be:
['the', 'reasons', 'for', 'this', 'dive', 'seemed', 'foolish', 'now']
I tried other approaches Pincelate but it does not give good spelling when providing long sequence inputs, 'theresen-fandusfuri'
How can I convert the Phoneme predictions to words using Python?