Script consuming too many processing units

rit · December 13, 2023, 9:15pm

I am using the following script to download NDVI and GCVI data for field polygons. Each field has average size between 50 - 100 ha and the duration for which I am getting the data is 1 year. I have consumed about 70,000 processing units for nearly 2000 fields, which seems a bit excessive to me based on my previous experience with Sentinelhub statistical API. Can someone take a look at the script below and suggest any obvious issues in terms of speed (I realize that I can follow the reduce processing cost section here: Sentinel Hub Statistical API — Sentinel Hub 3.10.0 documentation) and am planning to look into that next:

evalscript = """
                //VERSION=3

                function setup() {
                    return {
                        input: [{
                            bands:[
                            "B03", 
                            "B04", 
                            "B08",
                            "CLD",
                            "dataMask"]}],
                        
                        output:[
                        {
                            id: "CLD",
                            bands: 1
                        },
                        {
                            id: "GCVI",
                            bands: 1
                        },
                        {
                            id: "NDVI",
                            bands: 1
                        },
                        {
                        id: "dataMask",
                        bands: 1
                        }]
                    }
                }

                function evaluatePixel(samples) {                                                        
                    var validDataMask = 1
                    if (samples.B08 + samples.B04 == 0) {
                        validDataMask = 0
                    }                  
                    
                    return {
                        CLD: [samples.CLD],
                        GCVI: [(samples.B08/samples.B03)-1],
                        NDVI: [index(samples.B08, samples.B04)],
                        
                        // Exclude nodata pixels, pixels where NDVI is not defined and
                        // water pixels from statistics calculation
                        dataMask: [samples.dataMask * validDataMask]
                    };
                }
            """

input_data = SentinelHubStatistical.input_data(DataCollection.SENTINEL2_L2A)
client = SentinelHubStatisticalDownloadClient(config=config)

gdf = gpd.read_file('compiled.shp')

# Convert to WGS84
gdf = gdf.to_crs(epsg=4326)

frames = []
for idx, row in tqdm(gdf.iterrows(), total=len(gdf)):
    yearly_time_interval = date(row['year'], 1, 1), date(row['year'], 12, 31)

    aggregation = SentinelHubStatistical.aggregation(
        evalscript=evalscript, time_interval=yearly_time_interval, aggregation_interval="P1D"
    )

    request = SentinelHubStatistical(
        aggregation=aggregation,
        input_data=[input_data],
        geometry=Geometry(row.geometry, crs=CRS(gdf.crs)),
        config=config,
    )

    download_request = request.download_list[0]
    vi_stats = client.download(download_request)

gmilcinski · December 13, 2023, 10:00pm

I don’t see resolution defined anywhere, i.e. resx and resy

Also, I see you/ve made 7000 requests, not 2000.

rit · December 13, 2023, 10:07pm

Thanks Grega, I mostly copied the code here: https://sentinelhub-py.readthedocs.io/en/latest/examples/statistical_request.html# and it does not have rex and resy. How would they reduce processing cost?

gmilcinski · December 13, 2023, 10:31pm

I am not familiar with all the details of sh-py, but I see “size” in the example, which is missing in your case.
Resolution corresponds to “Area of interest size” factor. 200 ha would correspond to cca 100x200 px , so factor of 0.08. With 4 bands and 100 observations over the year comes to 10 PU per request.