Metadata population delay from SNS Topic Subscription


I currently subscribe to the following SNS Topic to trigger my data pipeline.


However, what I’ve noticed is when we read the generated metadata/info link immediately, every hour, 50-150 of the 1200~1600 messages do not have the tileDataGeometry available therefore my lambda fails. In order to avoid this I have setup a delay on our SQS queue to wait to ingest the metadata 10 minutes from SNS topic, and on failure we wait another 30 minutes and we have not had an error for the past two weeks.

I was wondering if there’s any information regarding this delay, if it was intentional, how long does it usually take to populate, and if the SNS message can be sent once the data can be populated. And I was wondering how others may have handled this if they are consuming similar data.

I am extracting data from these links.

Can you provide more details on your observations that some tileInfo.json does not have tileDataGeometry available?

So this is our pipeline.

SNS Topic Subscription Pushes Message to our SQS Queue, or SQS Queue immediately triggers a lambda to consume tileDataGeometry and datastrip[“id”] from the path link given by the SNS message.
[for some reason I can’t post a link :(]

We noticed that in our pipeline we were receiving a lot of lambda errors, when we debugged it, it appeared that while trying to parse tileDataGeometry from sentintel-hub, it states that the variable doesn’t exist, however when we check it by hand (hours later), the data was there so we were uncertain why the lambda failed.

We then added a larger and larger delay from the SNS Subscription message to triggering the lambda by ten minutes and the errors lessened and we logged when the data isn’t there to send the message back in the queue to wait once again.

As far as I know, the SNS is sent at the end of the process and productInfo.json is sent as part of the SNS notification and not changed later on.
Might it be that there is a bug in your code?

I think the issue is that the SNS message contains productInfo not tileInfo which is where the geometry data is located. Upon receiving the SNS message, a request to the<>tileInfo.json endpoint can sometimes fail to provide the geometry data. There seems to be a non-deterministic delay between the SNS message and when the geometry data is available.

A similar concern was brought up here: SNS message format? but a suggestion or resolution wasn’t proposed.

Is this intended/expected behavior or are there best practices for minimizing or avoiding calls to the roda endpoint so that they don’t results in a failure to find the geometry data?

[edit] added link to similar post

RODA is using CloudFront API and it might be that this delays the availability a bit. Can you perhaps check the S3 file directly, rather than through RODA? I am not sure this would solve the problem, but it should be an easy fix if it does…

Interesting, we can definitely give that a shot. We will experiment with that and report back for future reference.

Thanks for the suggestion.

For those who may try to do this in the future, it seems that getting the tileInfo.json file directly from s3 seems to be the lowest latency and more robust way to acquire the associated metadata. The roda api is helpful for certain situations but isn’t suited for reacting to an SNS notification in a rapid way to obtain tile metadata.