Audio Streaming: RTP-Stream receiving with Gstreamer - Latency

1.8k Views Asked by At

I am currently playing around with an AudioOverIP Project and wondered if you could help me out. I have a LAN, with an Audio Source (Dante/AES67-RTP-Stream) which I would like to distribute to multiple receivers (SBC (e.g. RaspberryPi) with an Audio Output (e.g. Headphone jack):

PC-->Audio-USB-Dongle-->AES67/RTP-Multicast-Stream-->LAN-Network-Switch-->RPI (Gstreamer --> AudioJack)

I currently use Gstreamer for the Pipeline:

gst-launch-1.0 -v udpsrc uri=udp://239.69.xxx.xx:5004 caps="application/x-rtp,channels=(int)2,format=(string)S16LE,media=(string)audio,payload=(int)96,clock-rate=(int)48000,encoding-name=(string)L24" ! rtpL24depay ! audioconvert ! alsasink device=hw:0,0

It all works fine, but if I watch a video on the PC and listen to the Audio from the RPI, I have some latency (~200-300ms), therefore my questions:

  1. Do I miss something in my Gstreamer Pipeline to be able to reduce latency?
  2. What is the minimal Latency to be expected with RTP-Streams, is <50ms achievable?
  3. Would the latency occur due to the network or due to the speed of the RPi?
  4. Since my audio-input is not a Gstreamer input, I assume rtpjitterbuffer or similar would not help to decrease latency?
1

There are 1 best solutions below

0
On

Cheap workaround: Delay your video by 200ms, e.g. using VLC.

TL;DR

  1. Yes you can decrease the latency via alsasink buffer
  2. AoIP latency is easily < 5ms
  3. Processing and network could be a bottleneck, but not here probably
  4. Adding rtpjitterbuffer will add more latency

1. Gstreamer Pipeline

This pipeline shows a 40ms round trip latency, by reducing alsasink buffers:

gst-launch-1.0 udpsrc address=239.69.x.x port=5004 multicast-iface=eth0 !\
    application/x-rtp, clock-rate=48000, channels=2 !\
    rtpL24depay !\
    audioconvert ! audio/x-raw,format=S24LE,channels=1 !\
    alsasink device=hw:CARD=Loopback,DEV=0 buffer-time=5000 latency-time=500 sync=false

I only adjusted the alsasink part. Running gst-inspect alsasink shows buffer-time is 200ms by default. I was able to decrease it to 5ms before seeing audio artifacts. When decreasing it, make sure to adapt latency-time ("minimum amount of data to write") to a smaller value than the buffer-time. Also had to add sync=false to get rid of some warnings.

Round Trip Latency

I did a measurement for the pipeline. One has to set Audacity latency compensation to equal Audacity buffer. You see a small artifact, probably due to non-realtime operating system.

Ubuntu-PC -> DanteAVIO USB-C -> Cat5e -> Gstreamer

40ms roundtrip latency w/ Audacity and improved pipeline

2. Low Latency AoIP:

It is possible to get Dante/AES67/RTP latency below 50 ms definitely, if your device has comparable specs to an NXP i.MX8.

If your goal is: "Getting low-latency AES67 stream to RPi", I suggest to look at a Ravenna implementation for Linux. Ravenna is similar to Dante, but more AES67 compliant. Ravenna and AES67 use RTP exactly the same way. Also pipewire recently supports AES67 (I didn't test it).

I know of these viable options:

  • Merging Technologies provides the "original" with commercial only support. The control tool is proprietary and x86 only.
  • There is a fork that contains useful patches and open source control software.
  • Pipewire AES67 implementation. It is not released at the moment, but you can already compile a working version from master branch.
  • Dante Embedded Platform, but you probably won't get your hands on that.

3. Latency Sources

For AoIP there are different sources of latency:

  • Packet time: time to account for samples in packet, ca. 1ms.
  • Network: depends on your hardware and settings, ca. < 1ms to many seconds.
  • Receiver Buffer: configurable, ca. < 1ms to many ms.
  • Processing and DAC at the receiver.

In this case the receive buffer in alsa was set to 200ms by gstreamer.

Packet time should be 1ms, because Dante uses AES67 mandatory profile (48 kHz, 48 samples/packet). I assume your sender handles this correctly, but I can't tell.

On the network, you should use Gigabit switches and Cat 5e Cables at least. Make sure to follow recommendations for switches and configuration (Esp. disable EEE). If the clocks of Sender and receiver are synchronized, run tcpdump or Wireshark on both to get a good estimate of your network latency. Filtering for payload is the easiest by port, e.g. RTP: port 5004 or port 9875. Unmuting/Muting is easy to spot in the capture (latency = T_send - T_recv).

RPi low latency audio seems possible. Still, you could measure the speed of ADC/DAC on the RPi. Optimizing your linux for realtime would allow you to decrease the alsa buffer even further.

4. Gstreamer Rtpjitterbuffer

You do not need rtpjitterbuffer. Its docs talk about "retransmission". AES67 standard doesn't include retransmission of payload. Also, the rtpjitterbuffer element adds another buffer of 200 packets, that is 200ms at 1ms packet time.

Professional Audio + Video

You mention Video and Audio in your setup. If done professionally this would require an AV over IP solution that keeps audio and video in sync. Examples are DanteAV or something SMPTE 2110 compatible. Your approach seems fine to me for hobby projects. Keep in mind, Dante in AES67 mode has some constraints.