S2CLOUDLESS: get_cloud_masks is very slow on large bbox

Timothee · December 13, 2018, 8:44am

Hi !

I’m always working on my script for temporal analysis. But i’m stuck because of slowness of .get_cloud_masks command on large area.

I use BboxSplitter to split my very big area and to have bbox under 5000 pixel height or large:

largeur = int(np.ceil((Xmax_utm - Xmin_utm)/10)) # Calcul de la largeur de l'image
hauteur = int(np.ceil((Ymax_utm - Ymin_utm)/10)) # Calcul de la hauteur de l'image
print('\nArea pixel size: {0} x {1}'.format(largeur,hauteur))

>>>Area pixel size: 9968 x 7245

if largeur > 5000 or hauteur > 5000: # Si la largeur ou la hauteur depasse 5000 pixels
    if largeur > 5000:
        L = int(np.ceil(largeur/5000))
        print('%s cells wide' % (L))
    else:
        L = 1
    if hauteur > 5000:
        H = int(np.ceil(hauteur/5000))
        print('%s cells high' % (H))
    else:
        H = 1

>>>2 cells wide
>>>2 cells high

Here is an illustration:

I’m testing it on only 3 dates and it’s already long, so I can’t imagine on 3 years…
Do you have an idea to accelerate it ?

anze.zupanc · December 13, 2018, 9:02am

Hi Timothee,

we usually run cloud detection on lower resolution. We found out that running cloud detection at 160 m x 160 m resolution gives good results. Of course the post-processing parameters need to be adjusted accordingly. We usually set them to average_over=2 and dilation_size=1. If you do this you should observe speed up for factor 256.

Timothee · December 13, 2018, 9:11am

Ok maybe it’s a good option. But I’m a little confused because I want cloud percentage on agricole blocks so 160m resolution could be to low… That’s why until now I used 10m resolution.
I’m still going to try this solution.

gmilcinski · December 13, 2018, 9:39am

Cloud detection is anyway a somehow statistical exercise and it is not “up to pixel accurate”. Perhaps worth exploring several options, e.g. 20m, 40m, 80m, 160m, to see, which one will produce best “price/performance” result. E.g. 20m will be 4 times as fast, 40m 16 times as fast…

Timothee · December 13, 2018, 9:43am

Thanks for all these informations. I’ll try to find the best option.

maleksandrov · December 13, 2018, 2:30pm

s2cloudless uses a pretrained random forest model for cloud classification. In the background all this is handled by lightgbm package, which is highly optimized for performance (speed and memory). By default it uses all processing cores available on your computer and could ever work on GPU.

Therefore one solution to improve speed performance would be to run your code on a machine with more processors.

Timothee · December 13, 2018, 2:45pm

Yes, it’ll run on jupyter hub on a google server. So we can modulate CPU power and number of core. I have to see that with my dev team. But does s2cloudless manage multi-threads ?

maleksandrov · December 13, 2018, 3:04pm

Yes, in s2cloudless multiple processors and multiple threads are always used. That is because lightgbm works that way by default.

anze.zupanc · December 13, 2018, 7:57pm

When running cloud detection at lower resolution don’t forget to adjust the post-processing parameters (average_over and dilation_size). At 10m resolution the values that work best are 22 and 11, respectively.

The recommended values are roughly

Resolution [m]	average_over	dilation_size
10	22	11
20	11	6
40	6	3
80	3	2
160	2	1

f.y.kidwai · May 10, 2019, 6:29am

For my use case, I need to run this on resolution 1m, can you recommend any values for that

gmilcinski · May 10, 2019, 6:44am

Why would you run it on 1 meter, if resolution of Sentinel data is 10m? You will not get any better results yet you will use 100 times more compute resources…
(or are you using some other datasource?)

f.y.kidwai · May 10, 2019, 7:00am

At 10m the images are way too pixelated…
I am making sentinel hub wcs request with resx and resy as 1m and am retrieving all bands data.
Part of this activity is cloud masking.
The bounding box will always be smaller than zoom level 14 tile.

gmilcinski · May 10, 2019, 7:13am

They might be pixelated, but these are original data. When making requests with resx/resy=1m, you get 10m resolution interpolated to 1m. This is useful in many ways, but you should not assume that actual resolution is 1m…

f.y.kidwai · May 10, 2019, 7:30am

Okay… Thanks… Will keep that in mind…

Is there any way of converting res 10m image to 1m image afterwards because I need to to display the True color imagery