Run statistical API in batch process inside for loop

reutkeller · June 12, 2022, 5:19am

Hello,

I want to run the statistical API in batch process for three differene gpkg that I have. I have written for loop to open each of this gpkg, run it for different dates and then save it in different location.
However, the result is that it runs only for the thirs out of three loops.

This is the for loop I have used:


urls_gpck=['s3://gis/path/file1.gpkg',
           's3://gis/path/file2.gpk',
           's3://gis/path/file3.gpk']

urls_results=['s3://gis/results/test1',
              's3://gis/results/test2',
              's3://gis/results/test3']

times=[["2019-09-01T00:00:00Z","2020-03-30T00:00:00Z"],
       ["2020-09-01T00:00:00Z","2021-03-30T00:00:00Z"],
       ["2021-09-01T00:00:00Z","2022-03-30T00:00:00Z"]]

for gpck,res_loc,date_list in zip(urls_gpck,urls_results,times):
    request_payload = {
      "input": {
      "features":{
          "s3": {
              "url": gpck,
              "accessKey": "SECRET1234",#fake :) 
              "secretAccessKey": "SecretAccessKEY1234" #fake :)
          }
      },
        "data": [
          {
            "type": "sentinel-2-l2a",
            "dataFilter": {
                "mosaickingOrder": "leastCC"
            }
          }
        ]
      },
      "aggregation": {
        "timeRange": {
                "from":date_list[0],
                "to": date_list[1]
          },
        "aggregationInterval": {
            "of": "P15D"
        },
        "evalscript": evalscript,
        "resx": 10,
        "resy": 10
      },
      "output": {
          "s3": {
              "url": res_loc,
              "accessKey": "SECRET",
              "secretAccessKey": "SecretAccessKey1234"
          }
      }
    }


    headers = {
      'Content-Type': 'application/json',
       'Accept': 'application/json'
    }

    url = "https://services.sentinel-hub.com/api/v1/statistics/batch"

    response = oauth.request("POST", url=url, headers=headers, json=request_payload)

    request_id = response.json()['id']

The result is that I get jsons results only for the third item (test3) but not for the first two items. It seems like the loop runs without really send the request?

My goal is to be able to run statistical api requests in batch process inside for loop, if possible.

chung.horng · June 13, 2022, 9:19am

Hi @reutkeller ,

I noticed that the url_gpck you provided in the script missing the g for file2.gpkg and file3.gpkg which may cause the bad request error.

When using a loop to create and send requests, I would suggest adding an error handler for each request sent. In this case it would be clear if the request is sent successfully.

Best Regards

reutkeller · June 13, 2022, 9:26am

@chung.horng this is error that occured when I copied the script here, unfortunatly in the real script is written correctly

Can you please elaborate about the error handler you use?

reutkeller · June 13, 2022, 9:41am

just to add to my previous post-
If I add to the for loop in the end:

response = oauth.request("GET", f"https://services.sentinel-hub.com/api/v1/statistics/batch/{request_id}/status")

it prints the requests with status created, one per gpkg. However it seems like something hapenning on the way or maybe on AWS?

chung.horng · June 13, 2022, 10:14am

@reutkeller ,

You might want to apply try and except as an error handler in your for loop. Below is an example code snippet:

for retry in range(3):
    try:
        response =  oauth.request("POST", url=url, headers=headers, json=payload)
        request_id = response.json()['id'] # this will raise an error if something wrong in the payload
        response = oauth.request("POST", f"{url}/{request_id}/analyse", headers=headers)
        status = oauth.request("GET", f"{url}/{request_id}/status").json()['status']
        while status not in ['ANALYSIS_DONE', 'FAILED']:
            time.sleep(10)
            status = oauth.request("GET", f"{url}/{request_id}/status").json()['status']
        if status == 'FAILED':
            raise RuntimeError(f"Request {request_id} is FAILED analysis.") # raise an error if failed to analyse
        response = oauth.request("POST", f"{url}/{request_id}/start", headers=headers)
    except BaseException as exception:
        if retry==2:
            print(f"Failed to request {request_id}. Reason: {exception}")
        else:
            time.sleep(10)

There are two possible steps which could fail to be processed:

Create a request with response = oauth.request("POST", url=url, headers=headers, json=payload). If something wrong in the payload this step could fail.
Analyse a request with response = oauth.request("POST", f"{url}/{request_id}/analyse", headers=headers). If something wrong with your gpkg or evalscript, this step could fail.