Best way to get raw data in Python

Hi everyone,

for my research I’m planning to collect raw data from open and commercial satellites, in particular Sentinel-2, Planet Scope and Pleiades.

I downloaded few Sentinel-2 TIFF files from EO Browser, but I really want to code it in Python to automate the entire process.

I really lost in the documentation. Could you give me some advice on the best way to collect this data?

Sorry for the very open question. I’m quite a newbie in the EO and all the stuff related.

Thanks a lot for taking the time to help me.

Hi,

Yes there is a dedicated Python library for requesting Sentinel Hub data which can be found here.

Here you will find a fully worked example on how to request and download the raw data from Sentinel 2. It will be a little different for Planet Scope and Pleiades as this is commercial data and therefore not free.

See how you get on with the python documentation and examples and if you have any questions let us know here in the thread!

1 Like

Hi @william.ray,

thanks for advicing me with the SH Process API tutorial. I followed along and managed to obtain the same results. In fact, now I have a TIFF file containing all the 13 bands (I checked for this with QGIS, and every georeferenced pixel contains information on all the bands).

I have a few other questions. Hope not to bother you.

  1. Is more practical having a file for each band? If so, how can I save each band of a different TIFF file?

  2. How can I extract information from a TIFF file? I’m reading more about the GDAL Python library. Am I on the right way? If not, could you advise me the best way to manage this?

In the tutorial we are " downloading raw digital numbers in the INT16 format instead of reflectances in the FLOAT32 format" for performances purpouses. Moreover, “the digital numbers are in the range from 0-10000, so we have to scale the downloaded data appropriately”.

  1. Does using integers affect accuracy so much? Is it better using FLOAT32 for research purposes?

In the tutorial, there is a parameter “resolution” used to dimension the output bounding box (right?).

  1. How does it relates to the resolution of the different bands?

I mean bands 2, 3, 4, 8 have a resolution of 10m, bands 1, 9, 10 have a resolution of 60m and the remaing a resolution of 20m. I was reading also this topic in the forum (10 m/px resolution for all Sentinel-2 bands : how?), but I cannot get the full sense of this.

It’s a lot of stuff, I know. Sorry!!

Thanks in advance!

Hi,

I’ll answer your questions one by one:

  1. This is up to you. Personally, I think it is more practical to have all the bands in one file but it depends on the application on whether you want to separate them.

  2. GDAL is the best way to read spatial raster data yes. There are many analytical libraries built upon GDAL. For example, you may also be interested in EO-Learn developed by our research team to extract valuable information from satellite imagery. This should help you with your research project.

  3. Integer rasters are only whole numbers, whilst Float rasters have decimals. This makes them more precise but also slower to process. For research, I would use FLOAT32 but again it depends on the application. You can find out more here.

  4. Resolution refers to the pixel size not the bounding box. There is a limit of 2500x2500 pixels in our process API. For larger areas you will need to using the Batch Processing API.

In your request, if you set the resolution as 10m, all bands will be outputted at 10m. Sentinel Hub automatically resamples the 60m into 10m pixels. This will make it easier to process later on as you cannot have multiple resolutions in a file. e.g. every band has the same resolution in the file. Therefore, all bands are resampled to a common resolution of your choice.

1 Like

Hi @william.ray,

thanks for all the answers!

I agree with you on having all the bands in one file. Can you tell me more about using GDAL for reading multiband spatial data? Is there any library you would reccomend me to use to have the work done in a simple way? I found only tutorial and documentation referring to single band spatial data.

Also, I agree with using FLOAT32. I tried to switch it in the evalscript, changing sampletype: "INT16"
to sampletype: "FLOAT32". Anyway in QGIS I have exactly the same values. Does it depend on the fact that bands have to be multiplicated in evaluatepixel()? For example, UINT8 is multiplicated by 255, UINT16 is multiplicated by 65535… but, the INT16 used in the first tutorial you linked me was multiplied by nothing (even if, as documentation said, INT16 return the same as UINT16). Sorry, but I didn’t find anything relevant in the documentation you linked to me. What’s the right way to proceed?

Thanks a lot!!

There is plenty of documentation already for GDAL, for example here. I would suggest googling tutorials on how to handle multiband data. As already suggested, try EO-Learn.

As for the SampleType if you look at the different sensors you will notice they have different native Sample Types. For example, Sentinel 2 is UINT16, whilst Landsat 8 is FLOAT32.

1 Like

Hi @william.ray ,

following your advice I managed to extract information from TIFF into Pandas DataFrame using GDAL!

Going back to one of the first question of my topic… Could you please explain me better how to get raw data from commercial satellites? I have access to that data thanks to NoR Sponsorship.

Thanks!

Hi @cesaredibiase2

To access raw data from commercial satellites through sentinel hub , you first need to order data for your AOI(if you have not ordered yet) . For this you can search , order and visualize your data directly in EO browser , with few simple steps. this video will guide you through the ordering process.

Please also refer to the documentation pages pleaides, planetscope for bands information, resolution, parameters etc,

Once your order is complete, your data is now accessible through sentinel hub. and you can visulaize it directly in EO browser

To request and download the raw data through SH python library , you need to :

  1. define a new data collection , as in this tutorial

    collection_id = ‘xxxxxxxxxxxx’
    byoc = DataCollection.define_byoc(
    collection_id,
    name=‘pleaides data’,
    is_timeless=True)

  • You will notice that collection_id is required. You get that from your order details or dashboard

  • then the rest of the steps are similar to how you get Sentinel 2 raw data.

hope that helps, and of course if you got further questions, just post it here

1 Like