I have implemented a subclass of the AudioMediaPort class and provided an override of the onFrameReceived() method. It is called whenever a frame is received into the conference it is connected to. But I have the following questions that I hope someone can help with:
- Are the frames that are received presented in onFrameReceived before or after jitter processing?
- If they are presented before any jitter processing, is there a way to get the RTP header information for each frame so that I can add my own jitter buffer to re-sequence?
Ultimately I may need to send the stream to a decoder (i.e. speex) so that I can do some audio processing of the stream, but the audio needs to be in PCM.
Here is how I am processing the frames that are received:
void Analyzer::processFrame(pj::MediaFrame &frame) {
spx_int16_t output[640];
speex_bits_read_from(&m_speexbits, reinterpret_cast<char *>(frame.buf.data()), frame.size);
int rc = speex_decode_int(m_speexstate,&m_speexbits,output);
if (rc != 0) {
LOG(LOG_ERR) << *this << ": speex decoder error: " << rc << std::endl;
}
// more processing....
}
After a few frames the speex decoder spits out the following to the console:
notification: Invalid mode encountered. The stream is corrupted.
- and/or -
notification: More than two wideband layers found. The stream is corrupted.
I suspect that it is because the frames are being presented as they arrive on the wire ahead of any jitter buffer processing.
After some additional testing, it appears that the onFrameReceived() method is called after any decoding and jitter handling is done. There is also some documentation about how it works at https://docs.pjsip.org/en/latest/specific-guides/media/audio_flow.html.
My problem seems to be more related to how the goertzel algorithm I'm applying to to the media stream differs when the original stream is PCMU vs when it was decoded from a speex stream, even though when I write that data to a wav file appears normal. But that is a problem for a separate question.