Timestamping audio from any language given the audio source and an accurate transcription

347 Views Asked by At

I'm looking to get roughly accurate timestamps for each word in an audio file. I also have the original text to go with the audio file which could be used as a cross-reference source of sorts. This is similar to "audio mining," which I believe is where you only have the input audio, whereas here I have both the audio and the text.

I'd ideally like to do this using open source software, and would like to accept most languages as input (e.g., English, French, German, Spanish and ideally Russian and Mandarin).

I would even accept a solution that could only match the time stamps of various words (e.g., if the transcription weren't completely accurate). Then cross-referencing the output text with the original to help realign things would be easier.

1

There are 1 best solutions below

0
On

I do work like this in my linguistics research. I use a program called ELAN, and I just noticed that they have a more recent version (4.5) than what I currently have installed on my Mac. The software was designed to help working with deaf languages so it supports video and audio frames and allows you to align your transcriptions accurately. The version that I use is 3.9 and that version used to do some kind of auto segmenting of words, which is kind of what I think you want to do. I don't see that feature on the latest version though, maybe with some digging it still is there.

Segmenting audio and video from this page.

Of course if you need an earlier version, you could always use ELAN 3.9. ELAN works on Mac, Linux, and Windows as it is Java based (I recall). Here is the link to ELAN. There are other linguistic annotation software out there. Another one that is really good, but difficult to learn. It is called PRAAT.

I hope this helps you. If I didn't quite understand your needs correctly, let me know and I]ll see if I can refine my answer for you. CHEERS!