For one of the applications I am working on, I need to stream audio and video from the web application to the backend through webrtc. This is done using kinesis webrtc JS sdk and the consumer is a viewer which uses Kinesis webrtc c sdk.
I am able to get the video and the audio data. Webrtc only supports pcm encoding now. My end goal is to use the audio stream for transcription with AWS transcribe. AWS transcribe supports only PCM encoding. So I need to convert the opus data into pcm data.
The audio packets that I am receiving at the backend are roughly 160bytes per packet. When I try to save the bytes into opus files and decode using "opusdec" I get the following error-
WARNING: Hole in data (4 bytes) found at approximate offset 160 bytes. Corrupted Ogg.
WARNING: Hole in data (156 bytes) found at approximate offset 160 bytes. Corrupted Ogg.
ERROR: No Ogg data found in file "sample-000.opus".
Input probably not Ogg.
The data streamed from the backend has valid opus files, I say this because when I see the aws kinesis console media player to view the streams, the video and audio are playing properly.
Can you please tell me how to make use of the opus stream data coming in packets in the backend? I need to be able to convert it into PCM encoding and use aws transcribe.
AWS Transcribe should now support Ogg Opus files (see press release)
If the audio is an encapsulated Ogg Opus file from start-to finish, You should be able to save it. You can verify this by inspecting the first 256 bytes and looking for "OggS", which is the Ogg page boundary:
$ xxd -l 256 audio.opus
If audio bytes are not an Ogg Opus file and are un-encapsulated, raw Opus packets, you would need to "package" the Opus packets into a container format (Ogg, WebM, etc) before saving the file.