RFC tutorial on RTP / RTCP protocol seems very confusing to me. I cannot find any state transition diagram for this protocol like this. It doesn't clear the difference between NTP and RTP Timestamp. It says it is useful for calculating round trip time. Can't it be calculated with the RTP timestamp alone?
The source will send a SR Report if and only if it recently sent a RTP packet otherwise it's a RR packet. How much the time interval is it actually to determine that if the sender has sent a packet recently?
what does the mixer do exactly? Does it take all the RTP packets coming from multiple sources and then at the application layer read it and repack them to multiple RTP packets with only SSRC being changed now? what if the packets type are different.
That protocol is media-oriented like RTSP ; the signaling protocol is responsible of state transition handling look at the couple SIP/RTP.
RTP Timestamp is used for intra-flow synchronization and NTP reference for inter-flows synchronization.
Yes, NTP is used when several flows need to be synchronized but if there is only one flow then RTP timestamp is enough. In summary, an rtp audio cmmunication does not need NTP but a rtp audio+video communication needs NTP in order to do lips-synch.
This is related to the 5% overhead: The control traffic bandwidth is in addition to the session bandwidth for the data traffic. It is RECOMMENDED that the fraction of the session bandwidth added for RTCP be fixed at 5%.
A mixer is quite complex but in essence you get it right, multiple flows are decoded and re-encoded to one flow ; so the mixer must be able to manage codec stuff inside payload if packets type are different.