How to calculate the Processing Unit

jan.blazek · November 3, 2020, 8:22pm

Hello. We are trying to calculate the Processing unit per hectare. Here is the example which we use for our calculations. Please, can you give me some feedback if we calculate it correctly for NDVI?

Your example on website:

Output size: 0,01 (20x20 px)

Input bands : 2/3

Output format: 1

Number of data sample : 1

Orthorecification : 1

PU = 0,01 x 2/3 x 1 x 1 x 1 = 0,0067 PU

The example on your website is designed for 20x20 px = 4h a = that is why Output size / multiplication factor : 0,01.

Also you define that the minimum value of multiplication factor is 0,01 which corresponds to an are of 0,25km2 = 25 ha.

Does it mean that the processing unit for 25ha in same example is the same ? 0,0067 PU ?

If I am correct, then processing unit for 250 ha would be : 0,0067 x 10 = 0,067 PU. Is that correct ?

Thank you for answering,

Jan

avrecko · November 5, 2020, 10:48am

Hi @jan.blazek,

assuming that you will be requesting spatial resolution of 10m/pixel then all your calculations are correct.

If we are to be very precise, the multiplication factor in your 250ha example would be:
250 ha / (5.12 x 5.12)ha = 9.54

jan.blazek · November 8, 2020, 9:02am

Thank you for answering. I am just wondering what is your precise example of calculation showing. Can you explain me what 5,12 stands for ? output resolution ?

Becasue the multi. factor calculated as an example on website is: 0,01. But the equation would be:
25 / (5,12 x 5,12) = 0,95.

What am I missing ?

gmilcinski · November 8, 2020, 12:16pm

Processing unit is defined for a size of 512x512px = 5120 x 5120 sq. m = 2621,44 ha.
250 ha with 2 input bands correspond to 250/2621,44*2/3=0,064 PU.

jlavarenne · March 1, 2022, 3:25pm

Hello all,

We are struggling in optimizing PU consumption, and also have questions about the way PUs are calculated.

We ran some downloads to try understanding better and given the documentation and exemples provided for S2, here are our observations, where the number of samples corresponds to the first dimension of obtained eopatches.

We use sentinelhub 3.4.4 and eolearn 1.0.0 to call Sentinel-2 data through the following SentinelHubInputTask:

additional_data = [
	(FeatureType.MASK, 'CLD'), 
    (FeatureType.DATA, 'CLP'),
    (FeatureType.DATA, 'sunAzimuthAngles'),
    (FeatureType.DATA, 'sunZenithAngles'),
    (FeatureType.DATA, 'viewAzimuthMean'),
    (FeatureType.DATA, 'viewZenithMean'),
    (FeatureType.MASK, 'dataMask'),
    (FeatureType.MASK, 'CLM')
	]

download_task = SentinelHubInputTask(
	data_collection=DataCollection.SENTINEL2_L2A, 
	bands_feature=(FeatureType.DATA, 'BANDS'),
	resolution=10, 
	maxcc=1, 
	bands=['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12'], 
	time_difference=datetime.timedelta(hours=2),
	additional_data=additional_data,
	config=sh_config
	)

Where do you think the observed differences in multiplicators could come from ?
Thanks in advance,
Best,
J.

gmilcinski · March 1, 2022, 5:56pm

Can you create three new client ID and run each of these examples using one client ID. Then share with us the pairs Example/client ID (do mask second half of the client ID), so that we can take a closer look.

jlavarenne · March 1, 2022, 7:21pm

Hello @gmilcinski, thanks for your prompt answer,
Here’s another example i just ran - there is a coeff x2 we are missing, but which one ?

gmilcinski · March 1, 2022, 8:01pm

In the log there are 103 requests, each consuming 1.5 PU.
So there seems to be a discrepancy between your “nb samples” (48) and “actual number of requests done” (105).
Can you perhaps turn on logging to see, why the discrepancy?
https://sentinelhub-py.readthedocs.io/en/latest/logging.html

jlavarenne · March 1, 2022, 9:39pm

Ok, it seems we didn’t take into consideration that we have in our workflow a simpleFilterTask that removes first dimension slices from the eopatches.

filter_clouds_task = SimpleFilterTask((FeatureType.MASK, 'CLM'), filter_clouds)

So there indeed seems to be in this example 103 passed requests.

DEBUG:eolearn.core.eoworkflow:Computing SimpleFilterTask(*[EOPatch(
  data={
    BANDS: numpy.ndarray(shape=(103, 156, 199, 12), dtype=float32)
    CLP: numpy.ndarray(shape=(103, 156, 199, 1), dtype=uint8)
    sunAzimuthAngles: numpy.ndarray(shape=(103, 156, 199, 1), dtype=float32)
    sunZenithAngles: numpy.ndarray(shape=(103, 156, 199, 1), dtype=float32)
    viewAzimuthMean: numpy.ndarray(shape=(103, 156, 199, 1), dtype=float32)
    viewZenithMean: numpy.ndarray(shape=(103, 156, 199, 1), dtype=float32)
  }
  mask={
    CLD: numpy.ndarray(shape=(103, 156, 199, 1), dtype=uint8)
    CLM: numpy.ndarray(shape=(103, 156, 199, 1), dtype=uint8)
    dataMask: numpy.ndarray(shape=(103, 156, 199, 1), dtype=bool)
  }
  meta_info={
    maxcc: 1
    size_x: 199
    size_y: 156
    time_difference: 7200.0
    time_interval: ('2020-10-14T00:00:00', '2021-07-01T00:00:00')
  }
  bbox=BBox(((238810.0, 3774970.0), (240800.0, 3776530.0)), crs=CRS('32630'))
  timestamp=[datetime.datetime(2020, 10, 15, 11, 11, 59), ..., datetime.datetime(2021, 6, 30, 11, 21, 53)], length=103
)], **{})
DEBUG:eolearn.core.eoworkflow:Removing intermediate result of download_task (node uid: SentinelHubInputTask-1bbabc5e999f11ecb27c-7910df318da1)
DEBUG:eolearn.core.eoworkflow:Computing AddValidDataMaskTask(*[EOPatch(
  data={
    BANDS: numpy.ndarray(shape=(48, 156, 199, 12), dtype=float32)
    CLP: numpy.ndarray(shape=(48, 156, 199, 1), dtype=uint8)
    sunAzimuthAngles: numpy.ndarray(shape=(48, 156, 199, 1), dtype=float32)
    sunZenithAngles: numpy.ndarray(shape=(48, 156, 199, 1), dtype=float32)
    viewAzimuthMean: numpy.ndarray(shape=(48, 156, 199, 1), dtype=float32)
    viewZenithMean: numpy.ndarray(shape=(48, 156, 199, 1), dtype=float32)
  }
  mask={
    CLD: numpy.ndarray(shape=(48, 156, 199, 1), dtype=uint8)
    CLM: numpy.ndarray(shape=(48, 156, 199, 1), dtype=uint8)
    dataMask: numpy.ndarray(shape=(48, 156, 199, 1), dtype=bool)
  }
  meta_info={
    maxcc: 1
    size_x: 199
    size_y: 156
    time_difference: 7200.0
    time_interval: ('2020-10-14T00:00:00', '2021-07-01T00:00:00')
  }
  bbox=BBox(((238810.0, 3774970.0), (240800.0, 3776530.0)), crs=CRS('32630'))
  timestamp=[datetime.datetime(2020, 10, 18, 11, 21, 55), ..., datetime.datetime(2021, 6, 30, 11, 21, 53)], length=48
)], **{})

Which leads me to the question : is this possible to pass requests only on the scenes that aren’t cloudy ? Here we see than more than half of the requests are being thrown away following the filter_cloud_task. The maxcc argument from SentinelHubInputTask seems to be suited for the job but its documetation is rather scarce ; is there a place where its exact functioning is documented ?

Also, regarding SentinelHubInputTask, it is described as a “Process API input task that loads 16bit integer data and converts it to a 32bit float feature”. I understand that this operation is done on the servir side, explaining the x2 multiplicator. Thus, what would be the way to switch to 16bit data retrieval using this function ?

Thanks in advance,
J.

gmilcinski · March 1, 2022, 9:53pm

Great to see that calculation of PUs is correct.
Would you be so kind and create a new thread for the cloud-related optimization? As it is not related to this topic.
Thanks.

jlavarenne · March 2, 2022, 7:49am

Sure thing @gmilcinski, sorry i got carried away. Here is a new topic.