I'm trying to implement server which would be seen as something like ONVIF Profile M - analytical device. As for now, most of endpoints are implemented in such way that Onvif Device Manager can communicate with my server successfully, but problems have risen once I started implementing streaming capabilites and metadata.
As for my understanding, it should work more or less in a way that:
- ODM (or any client of my device) at some points sends GetStreamURI request
- Server responds with URI of RTSP server
- Client sends RTSP request OPTION
- Server responds with list of available options
- Client sends RTSP request DESCRIBE
- Server responds with proper SDP which contains list of available endpoints (video, audio, metadata) along with Payload Type for each source.
- Based on what client is looking for, it sends SETUP request to the right endpoint, depending on protocol it also sends ports on which expects the streams to be provided.
I was trying to implement RTSP server using Gstreamer, however I'm not sure how should I handle some of the points I mentioned above:
I can easily create RTSP Server that streams video, as I can also setup UDP stream that streams some text metadata, but I cannot send text metadata (for example bounding boxes) over RTSP. No matter how I try to allign pipelines there is always something wrong with decoding payload type or negotiating capabilites between components of pipeline.
I'm having a hard time understanding how I should handle DESCRIBE request: as in ODM it is sent on base endpoint (rtsp://address:port), but rest of the request is send based on content of SDP. As for Gstreamer RSTPServer - it has mountpoints set to RTSPMediaFactory which in my understanding shall implement particular streams in own endpoints(/video, /metadata in my example). But it seems that gstreamer's RTSPServer itself cannot respond on DESCRIBE request as this functionality is handled in each of RTSPMediaFactory. I also cannot edit SDP that's being generated based on created pipeline. Is it possible to implement such RTSP server that has synchronized /metadata and /video on separate endpoints, which are definied on some main endpoint?
I recently discovered that there is a gst-rst-plugins package that contains some ONVIF functionality, such as onvifmetadatacombiner - is it possible to implement what I need using this aggregator? How shall such pipeline look like? I couldn't find any example of usage of gst-rst-onvif plugins.
My other question related to this topic is: I need to wrap this server inside docker container. How can I handle port mapping, since when I run container I don't know yet which port will be requested by client? And publishing all ports (which I'm aware is generally very bad practice) won't solve the problem, as they'll be mapped randomly.