I have an audio file of a long text with different sections all beginning with the spoken word "Chapter" (narrated by the same speaker). Is there a way to split the audio file in smaller files at these words?
I am thinking of cutting out one of the word occurrences of "chapter", put it in a separate audio file and then use some tool for fuzzy matching of the original audio against the short snippet to find the "chapter" occurrences and split the original file at these occurrences.
Which tool can do this? SOX? Audacity?
That would be doable. You need two steps:
To detect times you can use keyword spotting tool from pocketsphinx trunk, just checkout pocketsphinx from subversion and build it. It will install pocketsphinx_kws binary for keyword spotting. Then you can search for word times in an audio, which must be 16khz 16bit MSWAV format:
Frame rate is 100 frames/second so you see that the chapter is detected at 21.38s and 921.49 s (when user said "end of chapter")
It's better to use longer phrase for detection, the longer phrase is the better the detection would be. For the best detection you can tune a threshold.
To cut the audio you can use sox, you can use
trim
command to delete the start andtrim + reverse
to cut the end.