I have written a program that gets SIP packets in real time from the network and I want to use the SDP information embedded in the packets to capture the audio conversation from two VOIP soft phones.
Once I retrieve the binary data from the RTP protocol how should I go about converting it into a sound file?
c++ preferred.
Hi Adrian and welcome,
You are right, we cannot directly put the RTP payloads in a file concatenated one after another and then reading this file as an audio file, let's say a
".wav".The missing part that you are looking for is a piece of code that re-assemble, decode and play-out the rtp flow of packets into voice samples; for the sake of simplicity, consider the wellknown
G.711orPCMcodec because all SIP phone support this codec. You need to implement aPlayout buffer(logically an infinite buffer but a ring buffer with wrap around is ok).The packet itself contains audio data in small payload of 20ms duration. Each chunks of audio data is preceded with a RTP header, which indicates the type of encoding (This is related to the SDP information and you have a good understanding of that part).
For each packet:
Decode the 8-bits values into 16 bits samples at the right rate usually 8,000 times per second for
G.711;Compute from the RTP header the play-out point, it is the index in the play-out buffer array. Take into account jitter and re-ordering based on RTP timestamp
Write the samples into a
.wavor play it to an audio device.From a pragmatical point of view, you may do that in several ways:
wiresharkto do the hard work;