I am trying to understand some data issues I am having with using Sentinel-1 VV and VH data to create RVI statistics.
I am familiar with the fact that you need true quad polarisation to calculate the RVI. The best approximate RVI equation for S1 has been discussed at length at the ESA forum here. Hence I use the equation:
RVI=4*VH/(VV+VH)
I am also aware that it is a mathematical no-go to use the dB scaled data as input to the RVI equation (division) and that the main reason for scaling images logarithmically is to compress the dynamic pixel range so that it can be visualised in plotting.
What I am struggling with is why the RVI product never stays within the expected range of 0 to 1.
Allow me to illustrate. The first image below shows a cutout of a S1 tile. Its a random farmland area in Germany.
EDIT:
As pointed out by a sharp user, I did not include the VV or the VH band range information. See the two images below for dB scale
Min: -26.01792335510254, Mean: -10.698317527770996, Max: 15.05242919921875
Min: -48.487674713134766, Mean: -17.52926254272461, Max: 11.8444242477417
and the two for linear converted below:
Min: 0.002501541282981634, Mean: 0.1089702844619751, Max: 32.00685119628906
Min: 1.4165510947350413e-05, Mean: 0.029651135206222534, Max: 15.291229248046875.
If I clip the values at 1, so all values above are set to 1 I get this:
Which would create an RVI like this:
End edit
In the first example I have done it deliberately wrong by using the VV and VH band that have been logarithmically scaled as input to the RVI equation. We see the obvious issues of exceeding the expected scale of 0-1, as the values present a textbook gaussian distribution around the value 2.5.
In the second example I have converted VV and VH from dB to linear scale through this python code:
def db_to_linear(data: ndarray) -> ndarray:
return 10 ** (0.1 * data)
and then calculated the RVI. The image and the scale is much more in line of what one would expect but as you can observe from the red lines. A substantial amount of data falls above 1. Now I can understand that issues with the image such as processing artefacts, sensor issues, outliers etc. can produce odd values. But it just seems to be a lot of data points.
I then tried setting all values outside the range of 0,1 to nan and then plot it again, which provides figure 3 below.
Cutting out the data is definitely not a solution as the image next to the histogram shows, alot of data is lost.
Alternatively I can normalise the data to be within 0-1 so it looks like this:
But at this point I am unsure if what I am doing is sound as I put the RVI value much lower than it likely should be, so am I just forcing something to be "correct", rather than fixing some core issue that yet eludes me.
Thank you for your input and your time.