Process API - how to handle 'exceeded limit'

links · April 4, 2022, 11:25am

Hi there!
I’m experimenting with the Process API through the python library, so far they are very practical and intuitive to use.
However, I do have a couple questions since I find the documentation a bit lacking on some aspects, in particular:

Is the output file always called “response.tiff”, or can it be customized?
with large areas, I noticed that the requests consume all the processing units per minute available. This is not an issue per se, however the get_data call does not raise any exception and still returns a completely blank image while printing a warning “rate limit exceeded”, which is not ideal. How should I handle this? Is there a way to check whether the response was successful or generated a “rate limit exceeded” error? Or even better, could the client be configured so that the requests per minute are limited/scheduled over a longer period to avoid such problem?

Thank you very much, and sorry in advance if this info is available somewhere, I couldn’t find anything relevant.

Edoardo

chung.horng · April 4, 2022, 2:30pm

Hi @links ,

To customised the filename you can do the following:

request = SentinelHubRequest( ... )

request.download_list[0].filename = 'my-new-unique-name.tiff'

request.save_data()

Noted that in this case you have to ensure the uniqueness of the new name so that multiple requests won’t write into the same file.

If you request for a large area with Process API ti should return an error indicating that the output dimension should be less than 2500. Could you provide a snippet that allows me to reproduce the issue?

Best Regards

links · April 4, 2022, 4:32pm

Hi @chung.horng, thanks for the tips, however I must say I’m a bit confused also with the response to (1) , here’s the request I’m currently testing:

request = sh.SentinelHubRequest(evalscript=evalscript,
                                                       data_folder=self.config.cache_dir,
                                                       input_data=[
                                                       sh.SentinelHubRequest.input_data(
                                                                             data_collection=data_collection,
                                                                             time_interval=(start_date, end_date),
                                                                             maxcc=maxcc,
                                                                             mosaicking_order=mosaicking_order)
                                        ],
                                        responses=[sh.SentinelHubRequest.output_response("default", sh.MimeType.TIFF)],
                                        bbox=bbox,
                                        size=sh.bbox_to_dimensions(bbox, resolution=self.config.resolution),
                                        config=self.sh_config)
request.get_data(save_data=True, max_threads=4, raise_download_errors=True)

Specifically, I’m a bit confused on two points: first, which kind of output responses should I request, and second the get_data(save_data=True). From your answer, I gather that I should be able to do get_data() first, without saving yet, edit the name and then call save_data(), is it correct?

Regarding (2), I’m actually using the BBoxSplitter to produce a list of bboxes and then I simply iterate over them. This works, however with large areas it produces a lot of requests that end up in “rate limit exceeded”, because of the PU/min upper bound of my account I suppose. I was hoping for the raise_download_errors flag to throw an exception, but instead the client downloads an empty TIFF and it just prints a warning.

Thanks again!

chung.horng · April 5, 2022, 8:41am

Hi @links ,

Regarding (1), the format of responses depends on your need. If you’re looking for actual reflectance of S2 data for example, TIFF is the format you need. And Yes you can call get_data without saving it, then rename the filename and call save_data to save the data to your local machine.

Regarding (2), we have the rate-limiting-guard that could be useful. For a simple solution, you can also check the response headers and use the values to schedule the next request.

Noted that getting an empty image is not connected to rate limit warnings. The warning is there only to tell users “you are hitting rate limit, therefore download is slower”. An empty image comes from SH services is probably because there is no data for the requested area and time interval.

Best Regards

links · April 12, 2022, 4:29pm

Hey @chung.horng ,
Apologies for the late reply. I managed to do some testing and it’s exactly as you described for (2): there is indeed no data in the time interval specified.
However, what’s the best way to deal with missing data? Is there a way to trigger an exception instead of receiving a black image? I tried with “raise_download_errors=True” in the get_data call, but it’s not triggering for missing data at least.
The only way I found so far is using the catalog API, but even here there are different options: I went for the first and it generally satisfies 99% of the cases, but I still sporadically get some blanks, which I suspect are due to the maxCC value.
Is there a better way to completely avoid empty images? Thank you very much, as always.

william.ray · April 13, 2022, 8:06am

Hi @links

What maxCC value are you using? Have you tested whether this changes the number of “blank” images that you return?

links · April 13, 2022, 1:12pm

Hi @william.ray, I’m using maxcc=0.1 on the Process API call (specifically, the SentinelHubRequest builder), but in the catalog API (using SentinelHubCatalog) does not seem to provide a maxCC, I guess that’s the culprit: the catalog is telling that there is an image, however the process API is probably discarding it because of the maxcc constraint, and thus returning a blank image.
Is this assumption correct? Thanks!

chung.horng · April 19, 2022, 10:02am

Hi @links ,

Catalog API takes eo:cloud_cover as the query extension of cloud cover percentage of tiles. For example, if you’re searching for tiles having cloud cover percentage less than 10%, the following script does the work:

"query": {
        "eo:cloud_cover": {
            "lt": 10
        }

For more information, please refer to the Catalog API documentation. To figure out the extensions of your interested data collections, please find the Query extension and Distinct extension sections under the documentation page of each data collection, e.g., Query extension of S2L2A

Best Regards