How to interpret ffmpeg recording options available for a webcam (directshow)?

197 Views Asked by At

I am trying to create a GUI for personal use, that allows someone to customise recording and converting options of ffmpeg, without directly using the command line. At the moment, I am learning about different parameters and flags in ffmpeg.

Apologies in advance if I end up asking some stupid questions, I am on a learning journey at the moment, unfortunately not all of this info is available online in an easily understandable way.

I have a USB webcam which reported having the following options available to it:

[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=640x480 fps=5 max s=640x480 fps=30
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=640x480 fps=5 max s=640x480 fps=30 (tv, bt470bg/bt709/unknown, topleft) chroma_location=topleft
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=352x288 fps=5 max s=352x288 fps=30
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=352x288 fps=5 max s=352x288 fps=30 (tv, bt470bg/bt709/unknown, topleft)
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=320x240 fps=5 max s=320x240 fps=30
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=320x240 fps=5 max s=320x240 fps=30 (tv, bt470bg/bt709/unknown, topleft)
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=176x144 fps=5 max s=176x144 fps=30
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=176x144 fps=5 max s=176x144 fps=30 (tv, bt470bg/bt709/unknown, topleft)
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=160x120 fps=5 max s=160x120 fps=30
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=160x120 fps=5 max s=160x120 fps=30 (tv, bt470bg/bt709/unknown, topleft)
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=1280x1024 fps=5 max s=1280x1024 fps=9
[dshow @ 00000000003f9340]   pixel_format=yuyv422  min s=1280x1024 fps=5 max s=1280x1024 fps=9 (tv, bt470bg/bt709/unknown, topleft)

I just want to get to the bottom of how I should interpret this, apologies that I will ask multiple questions:

  1. The fact that both resolution and fps have a min and max value (for every option) seems to imply that these two parameters are supposably uncontrollably variable, right? In practice, the fps has been variable depending on brightness, however the resolution has not been - is it safe to assume that video imaging devices (especially such as a webcam) do not have variable resolution?

  2. Secondly, why is it that every option is listed twice, except half of them specify extra info, such as color_range, color_space, and chroma_location? Is this just a quirk? Surely those extra parameter options should not be discarded?

  3. It's hard to know how to make sense of this, but or example: the fact that only "tv" is ever shown, does that impliy that the webcam can only ever do limited color range, and there is no point trying to get full 0,255 out of it? I read somewhere that "pc" implies full range of 0-255, whereas "tv" implies a range of 16-235

  4. With regards to color space, is it acceptable to record the webcam as raw (un-encoded), and then later convert to a different color space later down the line? Which approach to dealing with the color-space yields the least amount of lost color? My only previous experience with color spaces is in the realm of images - where for example, it makes no sense to convert sRGB to ROMM16 RGB, because you're going to a color space which has wider coverage, and extra colors won't be created out of thin air, you'd want to go once from raw to a color space, and avoid converting between color spaces afterwards. Also, what does "unknown" mean in the color space options?

Here's the culmination of some research/testing i've done, is there anything correct, or seriously wrong, in the conclusions and assumptions I've made below?

My understanding of pixel_format is as follows: when you're recording, (even to raw), you specify the pixel format using something like "-pixel_format yuyv422", this is a "packed", not "planar" format, which is produced by the webcam. When you convert from raw to something like mkv using libx264, you can't specify a "packed" pixel format such as "yuyv422", but must instead use an appropriate planar counterpart, such as "yuv422p", which would be specified using "-pix_fmt yuv422p".

I did a raw recording of the webcam (in which I recorded a bright light, in the dark), I didn't set any of the options in the brackets above. I then converted this video using libx264 with the flags "-dst_range 1 -color_range 2" which I saw elsewhere on the internet.

Taking a screenshot of this video using vlc, and putting it through imagemagick identify -verbose, shows that the color range of the screenshot is 0,255, as for the video itself, "MediaInfo" reports "color range:Full", VLC's codec info says "Decoded format: Planar 4:2:2 YUV full scale - is this info worth anything, or is it just meta-data that the video got tagged with?

At first I was happy about imagemagick's color range reporting, but I am thinking now, the 0, 255 range could be a result of "overshoot" values produced by the camera, which aren't actually supposed to be mapped linearly.

I appreciate that this probably feels like some school-kiddy offloading their homework assignment to avoid doing work, but I hope it can be seen that I've looked into these things prior to putting this post together.

Thanks in advance, if anyone takes the time to answer anything.

1

There are 1 best solutions below

0
On

is it safe to assume that video imaging devices (especially such as a webcam) do not have variable resolution?

Typically, yes. Based on power conservation, they will modulate framerate to manage total pixels being processed.

why is it that every option is listed twice, except half of them specify extra info, such as color_range, color_space, and chroma_location?

The modes with the colorimetry specify PAL colors. I suspect the unqualified modes are for NTSC color.

the fact that only "tv" is ever shown, does that impliy that the webcam can only ever do limited color range

If the unqualified modes are NTSC, then this is correct.

Which approach to dealing with the color-space yields the least amount of lost color?

A webcam won't yield a broadcast quality image. Everything from the sensor to the onboard processor is geared for expedient just-good-enough output. As long as you encode with a low-ish CRF, you can always transpose to different colorspace using the scale or zscale or colorspace filter as desired.