What's the best example of how to dwl S1 & S2 tiles from AWS w batch API

mtulbure · May 4, 2022, 2:04pm

I’m wondering if you could point me in the right direction with re to the best way to download S1 and S2 data - I have a list of S1 and S2 tiles I want to download and I would like to pass on that list to query what’s in the archive and move to my S3 bucket.

I’ve seen the jupyter notebook on batch processing here

github.com

sentinel-hub/sentinelhub-py/blob/master/examples/batch_processing.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sentinel Hub Batch Processing\n",
    "\n",
    "A tutorial about [Large area utilities](./large_area_utilities.ipynb) shows how to split a large area into smaller bounding boxes for which data can be requested using [Sentinel Hub Process API](./process_request.ipynb). This tutorial shows another way of doing that.\n",
    "\n",
    "**Sentinel Hub Batch Processing** takes the geometry of a large area and divides it according to a specified tile grid. Next, it executes processing requests for each tile in the grid and stores results to a given location at AWS S3 storage. All this is efficiently executed on the server-side. Because of the optimized performance, it is significantly faster than running the same process locally. \n",
    "\n",
    "More information about batch processing is available at Sentinel Hub documentation pages:\n",
    "\n",
    "- [How Batch API works](https://docs.sentinel-hub.com/api/latest/api/batch/)\n",
    "- [Batch API service description](https://docs.sentinel-hub.com/api/latest/reference/#tag/batch_process)\n",
    "\n",
    "\n",
    "The tutorial will show a standard process of using Batch Processing with `sentinelhub-py`. The process can be divided into:\n",
    "\n",

This file has been truncated. show original

but it uses a shapefile as the AOI and a tiling grid so not quite what I’m after.

I’ve also seen this: Search for available data — Sentinel Hub 3.5.2 documentation

Thanks very much for any pointers.

chung.horng · May 4, 2022, 3:43pm

Hi @mtulbure ,

Here are some approaches to achieve what you want:

Use AWS CLI or boto3 (if using Python) to sync the requested tiles. Please check S1 and S2 documentation for AWS access.
Use sh-py AwsTileRequest to download the data and then upload to the AWS bucket.
Try to integrate Sentinel Hub into your workflow, and process data with Batch API as Sentinel Hub is not about “copying products” but rather “processing and getting data”.

We suggest the option 3 and we can help, but let us know what you want to do with it.

Best Regards

mtulbure · May 4, 2022, 5:43pm

Hi, thanks very much.
That sounds great. I’m happy to integrate Batch API into my workflows and also very happy with using boto3 in Python to query the SH buckets.
Here’s what I’d like to do:

I have 2 lists of unique S1 and S2 tile IDs;
I want to compute several indices for each tile in the list which are going to be used as features to ML models;
I want to copy those features to my own S3 bucket.
Do you have code example or maybe are you able to help me by sketching pseudocode as to what are the steps of how to do that? The Batch API notebook I found uses shapefiles of AOI rather than lists of tiles.
Thanks so much, M

mtulbure · May 4, 2022, 5:53pm

I forgot to mention that for each tile in my list I only want to query and process data during a specific time interval

chung.horng · May 5, 2022, 7:40am

May I ask a few questions:

What tile IDs you are referred to? Is it a product identifier (e.g., S2B_MSIL2A_20220504T075609_N0400_R035_T38TMT_20220504T103442) or the tiling grid id (e.g., 38TMT)?
What’s your desired format of the features? Do you what them stay in the same format (i.e., the crs, the dimension, etc.) as the original tile?

mtulbure · May 5, 2022, 3:08pm

Hi, yes,
Answer to Q1: Initially I will get a a list of tiling grid IDs (e.g., [‘18QYF’, ‘18QYG’, ‘18QZF’, ‘18QZG’]) which I then use to query your AWS buckets for product identifiers within certain date ranges. Once I get that list of product identifiers I then want to process the data to features and copy those features to my own S3.
Answer to Q2: The features can stay in the same format as the original tile.

Do you have code example or maybe are you able to help me by sketching pseudocode as to what are the steps of how to do that? The Batch API notebook I found uses shapefiles of AOI rather than lists of tiles.
Thank you

chung.horng · May 5, 2022, 3:53pm

Thank you for the explanation.

Unfortunately, we do not have an existed example code that fits to your workflow. That’s why we’re trying to understand what you’re trying to achieve, so we can help you come up with a workflow that fits the most.

In general, Batch API can only take geometries (either bounding box, polygon, or multipolygon) as an input. The advantage of Batch API is that you can have a really large area in country- or even continent-level (e.g., Europe). Then the API fetch the data within the selected time range (it is also possible to select the interested scenes by timestamps or some other metadata) and process the data as you wish (calculate indices for example). It should be noted that the output of Batch API is in UTM tiles and UTM coordinate reference system, so using Batch API requires some post-processing to convert the result to original tiling if sticking to the original tiling is a must in your workflow.

Let’s say it is acceptable for you to have features in UTM tiling grids, a workflow could be the following:

Get the geometry of the tiles in your list. Here’s a useful resource of S2 tiles.
Put all geometries together as a multipolygon.
Use the multipolygon as an input of Batch API and calculate all the features you need.
Apply some post processing to make the output fit to your ML model.

Let me know If the above workflow sounds good to you, and we can have further discussion.

Best Regards

mtulbure · May 6, 2022, 2:16am

Thank you for your answer. Too bad there is no code that supports querying a list of tiles rather than geometries. How would you suggest adding the S1? In my workflow, I’m interested in S1 tiles that overlap with S2 both spatially and temporally.
Are you available for a quick chat please as that would make it easier? Thank you

chung.horng · May 6, 2022, 8:40am

In my opinion the workflow would be:

Obtain the overlapped area of your S1 and S2 tiles (either a polygon or multi-polygon) as the input geometry.
Use batch api to fetch the data that overlaps temporally and compute the indices
Apply some post processing to the output for your ML model.