Differences between data downloaded from Sentinel Hub and data obtained from the Copernicus Data Space ecosystem using Sentinel Hub APIs

Hello everyone,

I tried downloading the same data from both SentinelHub and the Copernicus Data Space Ecosystem (CDSE), using the same evalscript with SentinelHub APIs. Specifically, I downloaded bands B04, B08, and the average NDVI (calculated on the cloud) from Sentinel-2 L2A images. When comparing the data downloaded from the two platforms (CDSE and SentinelHub), I noticed differences in pixel values that led me to suspect a discrepancy in the georeferencing process. Additionally, I observed that the invalid pixels, identified through SCL bands, are not the same. Therefore, I would like to know if there are any differences in processing for the data downloaded from the two platforms, even when using the same APIs."

1 Like

Hi Anna,

Please can you provide the example code for both workflows. Then I can replicate your results and work out what maybe the cause of the difference between the two results.

I provide a simplified version of the two scripts that only download the B04 band of a Sentinel2L2A image. I’m using a shapefile with EPSG:3035, but I don’t know how to upload it. The shapefile refers to a small piece of the Campania region, and the test was carried out for an acquisition on 04 October 2020.

The script for downloading from CDSE is:

# --------------------------------------------------------------------------------
# ++++++++++++++++++++++ Download B04 (S2L2A) from CDSE ++++++++++++++++++++++++++
# --------------------------------------------------------------------------------
"""
This script uses the SentinelHub API to download the B04 band of a Sentinel2L2A image
from the Copernicus Data Space Ecosystem (CDSE). Pixels identified as clouds, shadows, 
snow, ice, and saturated are considered invalid.
"""
# --------------------------------------------------------------------------------
# ++++++++++++++++ Import required packages from Sentinel Hub +++++++++++++++
# --------------------------------------------------------------------------------
import os
import sys
import geopandas as gpd
import glob
from os.path import join, dirname
from os import rename
import numpy as np
from shutil import rmtree
from sentinelhub import (
    SentinelHubRequest, 
    SentinelHubDownloadClient,
    DownloadRequest,
    DataCollection, 
    MimeType, 
    MosaickingOrder,
    CRS, 
    BBox, 
    SHConfig, 
    Geometry,
    bbox_to_dimensions)

# --------------------------------------------------------------------------------
# ++++++++++++++++++++++++++ INPUT and OUTPUT Directories ++++++++++++++++++++++++
# --------------------------------------------------------------------------------

# Main directory where this Python script is located
base_dir = os.path.dirname(__file__)

# Directory where outputs are saved
output_image = os.path.join(base_dir, 'OUTPUT');

# Directory from which the shapefile of the region of interest is loaded
shapefiles_dir = os.path.join(base_dir, 'Shape_Regione_campania/splitted_grid3035')


# --------------------------------------------------------------------------------
# ++++++++++++++++++++++++++++++++ Authentication ++++++++++++++++++++++++++++++++
# --------------------------------------------------------------------------------

config = SHConfig()
config.sh_client_id = 'xxxxxxxxxx'
config.sh_client_secret = 'xxxxxxxxxxx'
config.sh_token_url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
config.sh_base_url = "https://sh.dataspace.copernicus.eu"
if not config.sh_client_id or not config.sh_client_secret:
    print("Warning! To use the Process API, please provide the credentials (OAuth client ID and client secret).")

# --------------------------------------------------------------------------------
# +++++++++++++++++++++++ Region of Interest Shapefile ++++++++++++++++++++++++++++
# --------------------------------------------------------------------------------

# List of paths to various shapefiles in the folder
shapefiles_paths = glob.glob(os.path.join(shapefiles_dir, '*.shp'))

# List to store the results of each shapefile
result_images = []

# Loop through all shapefiles
for shapefile_path in shapefiles_paths:

    # Load the shapefile
    gdf = gpd.read_file(shapefile_path)

    # Get the geometry of the polygon
    polygon_geometry = gdf.geometry.iloc[0]

    # Create a Geometry object from the polygon geometry
    geometry = Geometry(polygon_geometry, crs=gdf.crs)

    print('Shape CRS')
    print(gdf.crs)
    
    # --------------------------------------------------------------------------------
    # +++++++++++++++++++ Parameters related to the requested images +++++++++++++++++
    # --------------------------------------------------------------------------------

    if len(sys.argv) != 3:
        print("\n[ ERROR ] you must supply 2 arguments: Test_Download_B04_CDSE.py [start_date] [end_date]")
        print("                                 i.e: Test_Download_B04_CDSE.py  2020-10-04 2020-10-04")
        sys.exit(1)
    else:
        
        # Resolution
        ris = 10
        
        # Temporal interval
        start_date = sys.argv[1]
        end_date = sys.argv[2]


    # --------------------------------------------------------------------------------
    # +++++++++++++++++++++++++++++++++ Evalscript +++++++++++++++++++++++++
    # --------------------------------------------------------------------------------

    evalscriptB04_comp = """
    //VERSION=3

    // Initialization of the setup function
    function setup() {
      return {
        // List of all bands that will be used in the script
        input: [
        {
          bands: ["B04","SCL"],
          units: ["reflectance", "DN"]  // "DN" for Digital Numbers,  "REFLECTANCE" for reflectance
        }
       ],
        // Definition of the output band
        output: { 
            bands: 1,
            sampleType: "FLOAT32" // "UINT16" for "DN", "FLOAT32" for "reflectance" 
        }
        ,
        mosaicking: "ORBIT"
      };
    }
    
    // Function that returns only valid pixels
    function isValid(sample) {
    var scl = sample.SCL;
    if (scl === 3) { // SC_CLOUD_SHADOW
        return false;
      } else if (scl === 9) { // SC_CLOUD_HIGH_PROBA
        return false;
      } else if (scl === 8) { // SC_CLOUD_MEDIUM_PROBA
        return false;
      } else if (scl === 7) { // SC_CLOUD_LOW_PROBA / UNCLASSIFIED
        return false;
      } else if (scl === 10) { // SC_THIN_CIRRUS
        return false;
      } else if (scl === 11) { // SC_SNOW_ICE
        return false;
      } else if (scl === 1) { // SC_SATURATED_DEFECTIVE
        return false;
      } else if (scl === 2) { // SC_DARK_FEATURE_SHADOW
        return false;
      }
      return true;
    }

    // Function to create a composite with the most recent valid pixels
    function evaluatePixel(samples) {
      // Sort samples with timestamps in descending order
      samples.sort(function(a, b) {
        if (a.metadata && b.metadata && a.metadata.timestamp && b.metadata.timestamp) {
          return b.metadata.timestamp - a.metadata.timestamp;
        }
        return 0;
      });
      var composite;
      for (var i = 0; i < samples.length; i++) {
        var sample = samples[i];
        if (isValid(sample)) {
          composite = sample.B04;
          break; // Break the loop after finding the first valid sample
        }
      }
       return [composite];
    }
    """
    
    # --------------------------------------------------------------------------------
    # ++++++++++++++++++++++ Download and Save Data +++++++++++++++++++++++++++++
    # --------------------------------------------------------------------------------

    # Initialization of variables
    process_requests = []
    save_locations = []
    
    # Set resolution and image dimensions
    resolution = (ris, ris)  # Set resolution to 10 meters
    #print(geometry.bbox)
    
    # Set area of interest
    bbox_AOI_size = bbox_to_dimensions(geometry.bbox, resolution=resolution)
    print(f"Image shape at {ris} m resolution: {bbox_AOI_size} pixels")

   
    #********************** B04_comp Processing Request *****************************
    
    # Name of the B04_comp image
    my_string_B04_comp = "B04_comp"
    my_folder_B04_comp = join(output_image, my_string_B04_comp)

    request = SentinelHubRequest(
        data_folder = my_folder_B04_comp,
        evalscript=evalscriptB04_comp,
         input_data=[
            SentinelHubRequest.input_data(
                data_collection=DataCollection.SENTINEL2_L2A.define_from(
                    name="s2l2a", service_url="https://sh.dataspace.copernicus.eu"
                ),          
                time_interval=(start_date, end_date),
            ),
        ],
        responses=[
            SentinelHubRequest.output_response('default', MimeType.TIFF),
        ],

        geometry=geometry,
        resolution=resolution,
        bbox=geometry.bbox,
        config=config
    )

    process_requests.append(request)
    save_locations.append(join(my_folder_B04_comp, request.get_filename_list()[0]))


    #*********************** DOWNLOAD Composite B04_comp IMAGE ***************************

    # Image download request
    download_requests = [request.save_data(raise_download_errors=True) for request in process_requests]

    # Rename B04_comp image
    for file in save_locations:
        target_name = dirname(dirname(file))
        rename(file, target_name + ".tif")
        # Remove leftover folder
        rmtree(target_name)

The script for downloading from SentinelHub is:

# --------------------------------------------------------------------------------
# ++++++++++++++++++++++ Download B04 (S2L2A) from SentinelHub +++++++++++++++++++
# --------------------------------------------------------------------------------
"""
This script uses the SentinelHub API to download the B04 band of a Sentinel2L2A image
from SentinelHub. Pixels identified as clouds, shadows, snow, ice, and saturated are 
considered invalid.
"""
# --------------------------------------------------------------------------------
# ++++++++++++++++ Import required packages from Sentinel Hub +++++++++++++++
# --------------------------------------------------------------------------------
import os
import sys
import geopandas as gpd
import glob
from os.path import join, dirname
from os import rename
import numpy as np
from shutil import rmtree
from sentinelhub import (
    SentinelHubRequest, 
    SentinelHubDownloadClient,
    DownloadRequest,
    DataCollection, 
    MimeType, 
    MosaickingOrder,
    CRS, 
    BBox, 
    SHConfig, 
    Geometry,
    bbox_to_dimensions)

# --------------------------------------------------------------------------------
# ++++++++++++++++++++++++++ INPUT and OUTPUT Directories ++++++++++++++++++++++++
# --------------------------------------------------------------------------------

# Main directory where this Python script is located
base_dir = os.path.dirname(__file__)

# Directory where outputs are saved
output_image = os.path.join(base_dir, 'OUTPUT');

# Directory from which the shapefile of the region of interest is loaded
shapefiles_dir = os.path.join(base_dir, 'Shape_Regione_campania/splitted_grid3035')


# --------------------------------------------------------------------------------
# ++++++++++++++++++++++++++++++++ Authentication ++++++++++++++++++++++++++++++++
# --------------------------------------------------------------------------------

config = SHConfig()
config.sh_client_id = 'xxxxxxxxxx'
config.sh_client_secret = 'xxxxxxxxxxxx'
if not config.sh_client_id or not config.sh_client_secret:
    print("Warning! To use Process API, please provide the credentials (OAuth client ID and client secret).")

# --------------------------------------------------------------------------------
# +++++++++++++++++++++++ Region of Interest Shapefile ++++++++++++++++++++++++++++
# --------------------------------------------------------------------------------

# List of paths to various shapefiles in the folder
shapefiles_paths = glob.glob(os.path.join(shapefiles_dir, '*.shp'))

# List to store the results of each shapefile
result_images = []

# Loop through all shapefiles
for shapefile_path in shapefiles_paths:

    # Load the shapefile
    gdf = gpd.read_file(shapefile_path)

    # Get the geometry of the polygon
    polygon_geometry = gdf.geometry.iloc[0]

    # Create a Geometry object from the polygon geometry
    geometry = Geometry(polygon_geometry, crs=gdf.crs)

    print('Shape CRS')
    print(gdf.crs)
    
    # --------------------------------------------------------------------------------
    # +++++++++++++++++++ Parameters related to the requested images +++++++++++++++++
    # --------------------------------------------------------------------------------

    if len(sys.argv) != 3:
        print("\n[ ERROR ] you must supply 2 arguments: Test_Download_B04_CDSE.py [start_date] [end_date]")
        print("                                 i.e: Test_Download_B04_CDSE.py  2020-10-04 2020-10-04")
        sys.exit(1)
    else:
        
        # Resolution
        ris = 10
        
        # Temporal interval
        start_date = sys.argv[1]
        end_date = sys.argv[2]


    # --------------------------------------------------------------------------------
    # +++++++++++++++++++++++++++++++++ Evalscript +++++++++++++++++++++++++++++++++++
    # --------------------------------------------------------------------------------

    evalscriptB04_comp = """
    //VERSION=3

    // Initialization of the setup function
    function setup() {
      return {
        // List of all bands that will be used in the script
        input: [
        {
          bands: ["B04","SCL"],
          units: ["reflectance", "DN"]  // "DN" for Digital Numbers,  "REFLECTANCE" for reflectance
        }
       ],
        // Definition of the output band
        output: { 
            bands: 1,
            sampleType: "FLOAT32" // "UINT16" for "DN", "FLOAT32" for "reflectance" 
        }
        ,
        mosaicking: "ORBIT"
      };
    }
    
    // Function that returns only valid pixels
    function isValid(sample) {
    var scl = sample.SCL;
    if (scl === 3) { // SC_CLOUD_SHADOW
        return false;
      } else if (scl === 9) { // SC_CLOUD_HIGH_PROBA
        return false;
      } else if (scl === 8) { // SC_CLOUD_MEDIUM_PROBA
        return false;
      } else if (scl === 7) { // SC_CLOUD_LOW_PROBA / UNCLASSIFIED
        return false;
      } else if (scl === 10) { // SC_THIN_CIRRUS
        return false;
      } else if (scl === 11) { // SC_SNOW_ICE
        return false;
      } else if (scl === 1) { // SC_SATURATED_DEFECTIVE
        return false;
      } else if (scl === 2) { // SC_DARK_FEATURE_SHADOW
        return false;
      }
      return true;
    }

    // Function to create a composite with the most recent valid pixels
    function evaluatePixel(samples) {
      // Sort samples with timestamps in descending order
      samples.sort(function(a, b) {
        if (a.metadata && b.metadata && a.metadata.timestamp && b.metadata.timestamp) {
          return b.metadata.timestamp - a.metadata.timestamp;
        }
        return 0;
      });
      var composite;
      for (var i = 0; i < samples.length; i++) {
        var sample = samples[i];
        if (isValid(sample)) {
          composite = sample.B04;
          break; // Break the loop after finding the first valid sample
        }
      }
       return [composite];
    }
    """
    
    # --------------------------------------------------------------------------------
    # ++++++++++++++++++++++ Download and Save Data +++++++++++++++++++++++++++++
    # --------------------------------------------------------------------------------

    # Initialization of variables
    process_requests = []
    save_locations = []
    
    # Set resolution and image dimensions
    resolution = (ris, ris)  # Set resolution to 10 meters
    #print(geometry.bbox)
    
    # Set area of interest
    bbox_AOI_size = bbox_to_dimensions(geometry.bbox, resolution=resolution)
    print(f"Image shape at {ris} m resolution: {bbox_AOI_size} pixels")

   
    #********************** B04_comp Processing Request *****************************
    
    # Name of the B04_comp image
    my_string_B04_comp = "B04_comp"
    my_folder_B04_comp = join(output_image, my_string_B04_comp)

    request = SentinelHubRequest(
        data_folder = my_folder_B04_comp,
        evalscript=evalscriptB04_comp,
         input_data=[
            SentinelHubRequest.input_data(
                data_collection=DataCollection.SENTINEL2_L2A,          
                time_interval=(start_date, end_date),
            ),
        ],
        responses=[
            SentinelHubRequest.output_response('default', MimeType.TIFF),
        ],

        geometry=geometry,
        resolution=resolution,
        bbox=geometry.bbox,
        config=config
    )

    process_requests.append(request)
    save_locations.append(join(my_folder_B04_comp, request.get_filename_list()[0]))


    #*********************** DOWNLOAD Composite B04_comp IMAGE ***************************

    # Image download request
    download_requests = [request.save_data(raise_download_errors=True) for request in process_requests]

    # Rename B04_comp image
    for file in save_locations:
        target_name = dirname(dirname(file))
        rename(file, target_name + ".tif")
        # Remove leftover folder
        rmtree(target_name)
    

Hi William,

I hope you’ve had a chance to use the code I provided to replicate the results. Please let me know if everything worked smoothly or if you have any questions.

Hi Anna,

Apologies that this has not been solved yet. Please provide us with a directly comparable example. For simplicity, please provide the Sentinel Hub Request, which will contain the bounding box and time range used in your request. This means I can replicate your results exactly, rather than guessing where in Campania your AOI is. You can use Request Builder to build this request if that helps you.

Hi William.

This is the body of Sentinel Hub Request:

{
  "input": {
    "bounds": {
      "bbox": [
        4681903.305609,
        1981958.981233,
        4686951.734307,
        1986431.434369
      ],
      "properties": {
        "crs": "http://www.opengis.net/def/crs/EPSG/0/3035"
      }
    },
    "data": [
      {
        "dataFilter": {
          "timeRange": {
            "from": "2020-10-04T00:00:00Z",
            "to": "2020-10-04T23:59:59Z"
          }
        },
        "type": "sentinel-2-l2a"
      }
    ]
  },
  "output": {
    "width": 501.2608418544286,
    "height": 420.7542793512827,
    "responses": [
      {
        "identifier": "default",
        "format": {
          "type": "image/tiff"
        }
      }
    ]
  },
  "evalscript": "//VERSION=3\n\n    // Initialization of the setup function\n    function setup() {\n      return {\n        // List of all bands that will be used in the script\n        input: [\n        {\n          bands: [\"B04\",\"SCL\"],\n          units: [\"reflectance\", \"DN\"]  // \"DN\" for Digital Numbers,  \"REFLECTANCE\" for reflectance\n        }\n       ],\n        // Definition of the output band\n        output: { \n            bands: 1,\n            sampleType: \"FLOAT32\" // \"UINT16\" for \"DN\", \"FLOAT32\" for \"reflectance\" \n        },\n        mosaicking: \"ORBIT\"\n      };\n    }\n    \n    // Function that returns only valid pixels\n    function isValid(sample) {\n    var scl = sample.SCL;\n    if (scl === 3) { // SC_CLOUD_SHADOW\n        return false;\n      } else if (scl === 9) { // SC_CLOUD_HIGH_PROBA\n        return false;\n      } else if (scl === 8) { // SC_CLOUD_MEDIUM_PROBA\n        return false;\n      } else if (scl === 7) { // SC_CLOUD_LOW_PROBA / UNCLASSIFIED\n        return false;\n      } else if (scl === 10) { // SC_THIN_CIRRUS\n        return false;\n      } else if (scl === 11) { // SC_SNOW_ICE\n        return false;\n      } else if (scl === 1) { // SC_SATURATED_DEFECTIVE\n        return false;\n      } else if (scl === 2) { // SC_DARK_FEATURE_SHADOW\n        return false;\n      }\n      return true;\n    }\n\n    // Function to create a composite with the most recent valid pixels\n    function evaluatePixel(samples) {\n      // Sort samples with timestamps in descending order\n      samples.sort(function(a, b) {\n        if (a.metadata && b.metadata && a.metadata.timestamp && b.metadata.timestamp) {\n          return b.metadata.timestamp - a.metadata.timestamp;\n        }\n        return 0;\n      });\n      var composite;\n      for (var i = 0; i < samples.length; i++) {\n        var sample = samples[i];\n        if (isValid(sample)) {\n          composite = sample.B04;\n          break; // Break the loop after finding the first valid sample\n        }\n      }\n       return [composite];\n    }"
}

I am also attaching the images obtained both from SentinelHub and CDSE

SentinelHub_image.tiff (483.4 KB)

CDSE2.tiff (485.8 KB)

Thank you, I will get back to you once I have an answer for you.

Thanks for your patience: the reason for the difference between the data from the Sentinel Hub AWS endpoint and the Copernicus Data Space Ecosystem endpoint is that for data before 25th January 2022 the Sentinel Hub endpoint doesn’t yet have Collection-1 data. Therefore, anything processed before this on the Sentinel Hub AWS endpoint before this date will have been processed with the older baseline.

There is work to reprocess the data on AWS and therefore both endpoints will converge in the future, however, for now there is no confirmed ETA for this.

For more information about this I recommend reading this page.