Catalog API: missing token error due to rate limiting?

Hi there!

I’m running a ProcessPoolExecutor (concurrent.futures.process), to process data about various AOI in parallel. Each process (i.e. worker) is using SentinelHubCatalog and calls .search() method to query all tile URLs containing the given AOI. The worker looks something like this:

def worker(bbox, date_interval):
    catalog = sentinelhub.SentinelHubCatalog()

    # Generate bounding box.
    search_bbox = sentinelhub.BBox(bbox=bbox, crs=sentinelhub.CRS.WGS84)

    # Request list of tile S3 paths which contain the AOI and whose cloud coverage < 50%.
    search_iterator = catalog.search(
        sentinelhub.DataCollection.SENTINEL2_L2A,
        bbox=search_bbox,
        time=date_interval,
        query={
            'eo:cloud_cover': {
                'lt': 50
            }
        },
        fields={
            'include': [
                'properties.datetime',
                'assets.data.href'
            ]
        }
    )

    # Parse tile URLs generated by the Catalog API.
    urls = [tile['assets']['data']['href'][:-1] for tile in list(search_iterator)]
    
    # DO SOME PROCESSING...

Running ProcessPool with 10 or even 20 parallel processes works great. However, once I increase that number to >30. I get the following error: oauthlib.oauth2.rfc6749.errors.MissingTokenError: (missing_token) Missing access token parameter.

Upgrading sentinelhub package to the latest version (sentinelhub-3.6.1) as suggested here didn’t help.

I also tried using locks (i.e. semaphores). This limited the number of errors but didn’t prevent them fully which makes me think this is due to a query rate limiting. Am I correct? If so, this error message is very confusing.

Is calling a .search() method on SentinelHubCatalog subject to the Requests/PU quota I can see in the dashboard? Could you suggest any workaround so I can process data using a higher number of workers?

Thanks!

Hi @ksnn,

Thanks for reporting this. We are investigating Sentinel Hub Authentication service for a potential cause of the problem.

In the mean time, I suggest that you check this tutorial about how to create a single authentication session and share it with any number of workers during parallelization. This is the recommended solution for a large-scale download from Sentinel Hub. Creating too many authentication sessions in parallel should be avoided.

Another alternative is to use Sentinel Hub Batch API to process and download large amounts of data. This way you can decrease your processing costs and scaling is handled by Sentinel Hub. Examples on how to use Batch with sentinelhub-py are here.

Thanks @maleksandrov! Sharing the session solved the issue. It’s great that you’ve included those steps in the tutorial. A more descriptive error message would ease debugging next time :slight_smile:

Great to hear the session sharing solved the problem. :slight_smile:

Today we also identified the reason for misleading error messages and prepared a fix. I suspect that in your case the real reason for error was temporary and probably related with too many created sessions. But it was incorrectly disguised as MissingTokenError. The fix will be released in the next package version.

1 Like