Cloud filtering and PUs optimisation

jlavarenne · March 2, 2022, 7:49am

Hello,

Following the conversation we were having on this other topic regarding the optimization of processing units usage, i was wondering if, when retrieving data using sentinelhub/eolearn Python packages, it is possible to pass requests only on the scenes that aren’t cloudy ? On the exposed example we were seeing that more than half of the scenes from the passed requests were being thrown away following the filter_cloud_task.

The maxcc argument from SentinelHubInputTask would seem on the paper to be suited for the job in calling only the cloud-free scenes but its documetation is rather scarce ; is there a place where its exact functioning is documented ?

Also, and more broadly speaking regarding SentinelHubInputTask, it is described as a “Process API input task that loads 16bit integer data and converts it to a 32bit float feature”. I understand that this operation is done on the server side, explaining the x2 multiplicator. Thus, what would be the way to switch to 16bit data retrieval using this function ?

Thanks in advance,
J.

batic · March 2, 2022, 8:33am

Hi @jlavarenne

Re: maxcc. maxcc stands for maximum cloud coverage, and it is a number (metadata) that is given for each (Sentinel-2) tile. As you can expect, filtering by maxcc<.5 might mean that you are throwing filtering your scenes despite not really having clouds over your AOI (but on the “other half” of the S-2 tile). Unfortunately, filtering by maxcc over a requested AOI is not possible in the same manner, but it can be done with some code tweaking.

One sure way of reducing number of data downloaded would be to (slightly) reduce maxcc. In the other thread I see you use maxcc=1. Reducing this to e.g. maxcc=0.8 should already remove quite a chunk of retrieved data.

The second way is a bit more complex, and it would be a two step process:

request just the cloud masks over your AOI (or even cloud + data mask), at a lower resolution, to build a list of suitable dates for which you’d request the data
request the data for the list of dates that are suitable

Unfortunately, eo-learn atm doesn’t have a task that would support such approach, but it sounds rather useful (hint: contributions welcome). A rather “simple” way would be to extend SentinelHubInputTask to accept list of timestamps for the download as a parameter to the execute function.

Re 16bit->32bit conversion in SentinelHubInputTask: the default units for Sentinel-2 data are reflectances, but you can specify band_dtype=np.uint16 which will request digital numbers DN in 16bit format to reduce PUs, and then convert the DNs to 32bit float numbers afterwards (on your side, not on the server side).

jlavarenne · March 2, 2022, 9:33am

Hello @batic, thanks for these useful insights and suggestions !

batic · March 7, 2022, 6:48am

Hi @jlavarenne

A side note: if you’d create an eo-patch based just on cloud masks, filter out observations that have larger cloud coverage than your threshold (pay attention that you have to filter out timestamps in such EOPatch timestamp feature), and pass this EOPatch to SentinelHubInputTask, the task will download observations only for timestamps already present in the EOPatch.

So something like this:

A SentinelHubInputTask to download CLM (possibly on low resolution)
retain timestamp for observations with appropriate cloud coverage, retain also bbox
execute new SentinelHubInputTask to download all S-2 bands, passing in the existing eopatch (containing bbox and retained timestamp)

leatresch · March 14, 2022, 2:09pm

Hi @batic,
Bringing this topic back as we’re trying to implement several of the suggestions listed here, particularly forcing the output types as 16 bits.

I’ve tried to specify band_dtype in the SentinelHubInputTask but it’s not one of the argument of the class (as can be seen in the github). Could you explain a bit more what you meant by that ?

I thus tried another method, i.e. making a whole new dataCollection using the DataCollection.define_from method and defining bands and metabands by exchanging the output_types and units to np.uint16 and DN when necessary, from already defined collections. On this end, you said “then convert the DNs to 32bit float numbers afterwards”. According to this FAQ, this means dividing DNs by 10000 in order to get reflectances, right? And changing the astype to np.float32 if we wish for such a format?

Also, final question to confirm my thoughts: np.float32 is the only available output_type for azimuth and zenith angles for S2, right? As stated in this table?

Thank you for your time!
Léa