Sentinel-2 SCL histogram

Hi,
I’m getting started with the FIS service, and I have a question on the histogram functionality. I’m trying to get the pixel counts for all possible scene classification values in a certain geometry. So far, I didn’t succeed. There are only three histogram types I can use, and “EQUIDISTANT” seems to be closest to what I need, but even so it doesn’t really work as expected. Let’s say there are 12 different possible values, I specified “bins=12”. However, it seems that the algorithm bins the resulting pixels inside a geometry according to the real occurring values, often much less than the total possible 12. It’s obviously also not constant. So in practice, the edges of the bins are always changing and in decimal numbers. This makes it very hard to retrieve exact bin counts for the individual scene classification values.

Is there any workaround or another approach I can follow?

Thanks,
Kristof

Interesting challenge. When designing FIS we did not really have discrete values in our mind, resulting in issues you observe.
Will think about it a bit more and discuss internally and come back to you.

1 Like

Can you post the example of your FIS request, so that we have something to test on?
Make sure to mask half of the instance ID.

That’s not rocket science, I just started from one of your examples:

I create a geometry, e.g.:

 geometry = Geometry(Polygon([(-5.13, 48),
                          (-5.23, 48.09),
                          (-5.13, 48.17),
                          (-5.03, 48.08),
                          (-5.13, 48)]),
                 CRS.WGS84)

Then I setup the FIS request for a custom layer I defined, which just returns the SCL layer:

fis_request = FisRequest(layer='S2_L2A_SCL',
                     geometry_list=[geometry],
                     time=('2019-01-01,'2019-07-01'),
                     histogram_type=HistogramType.EQUIDISTANT,
                     bins='12',
                     resolution='10m',
                     instance_id='69db9d34-5fc2-4583-****-************')

What I would like to get back is for each acquisition date a list with unique SCL values and the amount of pixels inside the geometry that fall in that particular SCL class. But as said earlier, I always get 12 bins now, with decimal edges, so I can’t trace back the original SCL values and the counts.

1 Like

I think you are actually not far from the solution for your problem.
As mentioned, FIS was not designed with this use-case in mind, but I think you can still use it exactly like you started. The only tweak is required in the interpretation of the results. FIS, as it is now, does the distribution of values between min and max value (this makes sense to me in general, don’t you think?). The problem appears, when in the sample you do not have full range of values, more specifically, you do not have 0 and 11 all the time.
So what you need to do is:
-take a look at the min and max value and calculate the equidistance ((max-min)/12)
-then you walk through the results and look for all bins, where count is not 0. For these you take the “lowEdge” and find the integer (there should be only one) in the interval of [lowEdge, lowEdge + equidistance]. These are your stats for a particular class.

E.g. for the output:

0
date “2019-07-18”
basicStats
min 0
max 4
mean 2.0370370370370376
stDev 1.481944372129078
histogram
bins
0
lowEdge 0
count 37
1
lowEdge 0.3333333333333333
count 0
2
lowEdge 0.6666666666666666
count 0
3
lowEdge 1
count 25
4
lowEdge 1.3333333333333333
count 0
5
lowEdge 1.6666666666666665
count 0
6
lowEdge 2
count 34
7
lowEdge 2.333333333333333
count 0
8
lowEdge 2.6666666666666665
count 0
9
lowEdge 3
count 27
10
lowEdge 3.333333333333333
count 0
11
lowEdge 3.6666666666666665
count 39

Equidistance = 0.3333
Non-empty bins are:
0 -> 0 (37 pixels)
1 -> 1 (25 pixels)
2 -> 2 (34 pixels)
3 -> 3 (27 pixels)
3.66 -> 4 (39 pixels)

I just came back from holiday, so I gave your suggestion a try. I’m not really sure what you mean by the quote above. Maybe I’m doing something wrong. But I have this example now:

geometry = Geometry(Polygon([(-5.13, 48),
                      (-5.23, 48.09),
                      (-5.13, 48.17),
                      (-5.03, 48.08),
                      (-5.13, 48)]),
             CRS.WGS84)
time_interval = ('2018-01-01', '2018-09-30')

fis_request_SCL_hist = FisRequest(layer='S2_L2A_SCL',
                     geometry_list=[geometry],
                     time=time_interval,
                     histogram_type=HistogramType.EQUIDISTANT,
                     bins='12',
                     resolution='10m',
                     instance_id='69db9d34-5fc2-****-****-************')

fis_request_SCL_hist_data = fis_request_SCL_hist.get_data()

for acquisition in fis_request_SCL_hist_data[0]['C0']:

    # Process histogram for acquisition
    minvalue = acquisition['basicStats']['min']
    maxvalue = acquisition['basicStats']['max']
    equidistance = (maxvalue - minvalue)/12.

    for bin in acquisition['histogram']['bins']:
            if bin['count'] != 0: print('{} - {}'.format(bin['lowEdge'], bin['lowEdge'] + equidistance))

Running this should print [lowEdge, lowEdge + equidistance] only when the count is not zero, as you propose. In the result, there is plenty of examples where there is no integer for either lowEdge or lowEdge + equidistance. Am I wrongly interpreting your suggestion? Maybe you mean the integer which is within the interval?

Ok I think I’m getting closer now. I guess you mean indeed that there should always be exactly one integer within the interval [lowEdge, lowEdge + equidistance].
Sometimes it’s exactly one of the boundaries:

[5.0 - 5.75] -> integer 5

Sometimes it’s somewhere within the interval:

[8.75 - 9.5] -> integer 9

I do get also these kind of intervals:

[9.333333333333332 - 9.999999999999998]

Surely looks like 10 will be the proper integer, but it’s not really in the interval, maybe due to rounding of the calculated equidistance value …

And related to this: any suggestion on an efficient way of finding this integer in Python?

I managed it by doing this :smile::

round(lowEdge + equidistance/2.)

This is what I meant, yes.
I am not a Python programmer, but something along the following should work, I guess:
var i = round(lowEdge,2);
var j = round(lowEdge + equidistance,2);
if (round(i,0) > i and round(i,0) <j) result = i; else result = j;

Allright, I got it fully working now. It’s a bit of a workaround and in our Belgian CGS we have implemented it differently so we get buckets and counts automatically, but it seems to work now for your service as well and it’s still quite fast. For the interested ones, here’s my code:

import numpy as np
import pandas as pd    
from shapely.geometry import Polygon
from sentinelhub import FisRequest, Geometry, CRS, HistogramType

geometry = Geometry(Polygon([(-5.13, 48),
                          (-5.23, 48.09),
                          (-5.13, 48.17),
                          (-5.03, 48.08),
                          (-5.13, 48)]),
                 CRS.WGS84)
time_interval = ('2018-01-01', '2018-09-30')

fis_request = FisRequest(layer='S2_L2A_SCL',
                             geometry_list=[geometry],
                             time=time_interval,
                             histogram_type=HistogramType.EQUIDISTANT,
                             bins='12',
                             resolution='20m',
                             instance_id='69db9d34-5fc2-4583-xxxx-xxxxxxxxxxx')

fis_request_data = fis_request.get_data()

histogram_data = {}

for acquisition in fis_request_SCL_hist_data[0]['C0']:

    # Process histogram for acquisition
    minvalue = acquisition['basicStats']['min']
    maxvalue = acquisition['basicStats']['max']
    equidistance = (maxvalue - minvalue)/12.

    values = {}

    for bin in acquisition['histogram']['bins']:
        if bin['count'] != 0:
            values[round(bin['lowEdge'] + equidistance/2. )] = bin['count']

    sorted_values = []

    for value in range(12):
        if value in values.keys(): sorted_values.append(values[value])
        else: sorted_values.append(0)

    histogram_data[pd.to_datetime(acquisition['date'])] = sorted_values

# Convert the resulting dict to the final Pandas dataframe
df = pd.DataFrame.from_dict(histogram_data, orient='index', columns=list(range(12))).sort_index()
1 Like