Request for data files with message precondition failed, tiles changed since ingestion

lde · February 21, 2024, 4:10am

I have a labelling platform that is reading BYOC image tiles using WMS. Lately I have been getting a few reports from the team using it that some images are not showing up. I am checking on those using the python API to form a SentinelHubRequest to get the data and plot it to see what is going on. The message I get back is:

DownloadFailedException: Failed to download from:
https://services.sentinel-hub.com/api/v1/process
with HTTPError:
412 Client Error: Precondition Failed for url: https://services.sentinel-hub.com/api/v1/process
Server response: "{"status": 412, "reason": "Precondition Failed", "message": "Reingest tile 's3://<snip>cat2/911445_2017-08/(BAND).tif' (id = <snip>) because one of its files changed since ingestion. Reingest it using the endpoint https://docs.sentinel-hub.com/api/latest/reference/#operation/reingestByocCollectionTileById.", "code": "FILE_CHANGED"}"

So there seems to be some issue with the file having changed.

Is there an easy way to query the tiles in a collection to see which ones might be affected (having changed since ingestion)? Using the following:

collections_iterator = byoc.iter_collections()
collection_id = [collection["id"] for collection in collections_iterator 
 if "cat2" in collection["name"]]

collection = byoc.get_collection(collection_id[0])
tiles = list(byoc.iter_tiles(collection))

And then looping through tiles to check for relevant messages doesn’t show any relevant information.

Since this is a collection with more than 30,000 tiles, I am hoping to be able to find tiles that might be affected so I can reingest them.

For completeness, here is the function I use to look for the individual image:

def request_from_sentinelhub(df, diam, collection, config, color="rgb", 
                             plot=True):
    """Request tile from center point with specific diameter from Sentinel-Hub
    collection and plot
    """
    x, y, date = df.iloc[0][['x', 'y', "date"]]
    tile_time = dateutil.parser.parse(date)

    bbox = BBox((x-diam, y-diam, x+diam, y+diam), crs=CRS.WGS84)

    if color == "ngb":
        evalscript = """
        //VERSION=3
        function setup() {
          return {
            input: ["B2","B3","B4", "dataMask"],
            output: { bands: 4 }
          }
        }

        function evaluatePixel(sample) {
          return [sample.B4/255, sample.B3/255, 
                  sample.B2/255, sample.dataMask]
        }
        """
    elif color == "rgb":
        evalscript = """
        //VERSION=3
        function setup() {
          return {
            input: ["B1","B2","B3", "dataMask"],
            output: { bands: 4 }
          }
        }

        function evaluatePixel(sample) {
          return [sample.B3/255, sample.B2/255, 
                  sample.B1/255, sample.dataMask]
        }
        """

    request = SentinelHubRequest(
        evalscript = evalscript,
        input_data=[SentinelHubRequest.input_data(
            data_collection=collection, time_interval=tile_time
        )],
        responses=[
            SentinelHubRequest.output_response("default", MimeType.PNG)
        ], bbox=bbox,
        size=bbox_to_dimensions(bbox, 3),
        config=config,
    )
    data = request.get_data()[0]# tiles = list(byoc.iter_tiles(created_collection))
    
    if plot:
        fig, ax = plt.subplots(figsize=(15, 10))
        ax.imshow(data)
        ax.set_title(tile_time.date().isoformat(), fontsize=10)
        plt.tight_layout()
        
    else:
        return data

A DataFrame provides the dates and coordinates of the query:

my_byoc = DataCollection.define_byoc(collection['id'])
request_from_sentinelhub(catalog.iloc[0:1], 0.005, my_byoc, config)

Contents of catalog:

        name    tile        x        y        date
0  SD1965429  335309  33.9015  18.2675  2018-02-15

Any pointers on how to find these problem tiles will be much appreciated!

tslijepcevic · February 21, 2024, 10:10am

Hi Ide,

When did you start having this problem?

We check for file changes only when files are accessed. Given that you have a lot of tiles, checking in such way is not optimal. You can instead compare ETags that we have and use for check, with the ETags from the storage. You can get ours by traversing through tiles and checking additionalData.filesMetadata.<source>.etag.

lde · February 22, 2024, 4:39am

Thanks very much for the quick response and suggestion. It started a week or two ago. I think the problem was I added to a bucket on s3, overwriting some COGs with a newer version (without realizing), and then went I went to create a new tile in the collection, it didn’t change, so the e-tags became mismatched. So I did as you suggested and looked at e-tags on the bucket and in the collection, found the mismatches, and reingested. For completeness, and in case of interest:

# for AWS resources, from here, https://github.com/agroimpacts/cloudtools
import cloudtools as ct 
# Sentinel Hub (probably don't need all these modules for this example)
from sentinelhub import (
    CRS,
    BBox,
    ByocCollection,
    ByocCollectionAdditionalData,
    ByocCollectionBand,
    ByocTile,
    DataCollection,
    DownloadFailedException,
    MimeType,
    SentinelHubBYOC,
    SentinelHubRequest,
    SHConfig,
    bbox_to_dimensions,
    WcsRequest,
    WmsRequest
)

s3_bucket = s3resource.Bucket(bucket_name) 

# get list of tiles from several prefixes
key_tags_list = []
prefixes = [f"tiles_prod/{p}" for p in ["qc", "cat2", "cat4"]]
for prefix in prefixes:
    key_tags = []
    for obj in s3_bucket.objects.filter(Prefix=prefix):
        key_tags.append({"key": obj.key, "e-tag": obj.e_tag})
    
    key_tags_list.append(pd.DataFrame(key_tags))
    
key_tags_df = pd.concat(key_tags_list)

# convert to df that additionally extracts some information
# (tile ID, date, category) unique to these images, e.g. 
# here is one key: tiles_prod/qc/1001785_2020-07/image.tif
cogs_df = (
    key_tags_df.copy()
    .assign(clas = lambda x: x.key.str.split("/").str[1])
    .assign(fulltile = lambda x: x.key.str.split("/").str[2])
    .assign(tile = lambda x: x.fulltile.str.split("_").str[0])
    .reset_index(drop=True)
)

# get SHUB collection (checking one called "cat2")
collections_iterator = byoc.iter_collections()
collection_id = [collection["id"] for collection in collections_iterator 
                 if "cat2" in collection["name"]]

collection = byoc.get_collection(collection_id[0])
print(f"name: {collection['name']}")
cat2_byoc = DataCollection.define_byoc(collection['id'])

# get tiles in collection
tiles = list(byoc.iter_tiles(collection))

# find tiles in collection that have mismatched e-tags with tiles on bucket
different_tags = []
for tile in tiles:
    shid = tile["id"]
    etag = tile["additionalData"]["filesMetadata"]["image"]["etag"]
    components = tile["path"].split("/")
    cat = components[1]
    tile_id = components[2].split("_")[0]
    shtile = components[2]
    
    s3cog = cogs_df.query("fulltile==@shtile")
    if s3cog["e-tag"].iloc[0] != etag:
        different_tags.append(
            {"shid": shid, "tile": tile_id, "path": tile["path"], 
             "e-tag": etag, "s3etag": s3cog["e-tag"].iloc[0], 
             "s3path": s3cog.key.iloc[0]}
        )

to_reingest = pd.DataFrame(different_tags)

# reingest
for i, row in to_reingest.iterrows():
    print(i, row.shid)
    byoc.reingest_tile(collection["id"], row.shid)

That seemed to fix it. Thanks very much for the suggestion!

system · April 22, 2024, 4:40am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.