I’m working on a pipeline to ingest the stream of SNS generated by the Sentinel-1 open data registry, and I thought I saw something odd… multiple observations that had what appeared to be the same footprint. I checked the geojsons, and sure enough they actually are identical.
Here’s an example:
S1B_EW_GRDH_1SDH_20200330T214541_20200330T214641_020927_027B16_DD6E
and
S1B_EW_GRDH_1SDH_20200330T214541_20200330T214641_020927_027B16_CE5C
I downloaded the quick-look.png’s, and they look exactly the same, and the productinfo.json’s are also almost identical. However, they have different SciHub ingestion data:
DD6E has “sciHubId” : “a5164b21-08e6-4ca0-92ac-836fdd4c4bbf”
CE5C has “sciHubId” : “2f0705f0-06e3-4bfc-8957-c4a494c782ab”
So then I dug into SciHub, BUT I can only find one object ingested that starts “S1B_EW_GRDH_1SDH_20200330T214541*”, and that is DD6E. I can’t find CE5C at all.
Does anyone know what’s going on, and how I can avoid these duplicates? I’m seeing roughly 50 pairs a day!
Hi @jona,
Sentinel products are processed in various ground segment entities, which is why there are sometimes duplicates available. It might also be (not sure) that _CE5C is so called NRT product.
We are pulling the data from various sources in order to ensure complete archive, which is why you do not find the product on SciHub.
What we do in such case is to ingest both product and then take into account the one with later ingestion timestamp, as it is (shoud be) better in terms of quality.
Best,
Grega
Hi Grega,
Thank you so much for your quick reply.
I have gone ahead and downloaded both tiff measurements, and can now confirm they are identical in size and resolution.
Can you explain what other sources you use to ensure a complete archive? Both of these files were given a SciHubID in the SNS, so I would have expected them both to be in SciHub. Is it possible for you to look up observations by their SciHubID? I tried and could not figure it out.
What is an NRT product? And how is it different from a GRD?
Sorry for so many questions–trying to learn as fast as I can!
We got confirmation from Copernicus Support that these products are indeed Near Real Time products, which are available on CopHub only, for a month, then they are deleted.
NRT products come faster but are usually of a bit lesser quality,.
We’ve implemented some checks on the system to make sure the files we are getting SNS notifications are also available on SciHub, and it’s surprising how many reported by Earth on AWS are not available on SciHub!
Over the course of a few minutes today, our system caught these exceptions as they came in from Sinergise’s SNS:
Our application will do fine, because it will throw away these SNS alerts, but we are curious…
Why are there so many measurements that disappear within 3 hours of being captured?! (This makes it seem like they are intended to be temporary, not NRT)
What percent of SNS alerts are for products that don’t exist?
Why does Earth on AWS store so many files that SciHub thinks should be deleted?
It’s not that these measurements dissapear. They are replaced[1] (on SciHub) with a non-NRT version, which has more thorough post-processing and QA in place.
Rather than throwing away these data, embrace it - NRT comes a couple of hours earlier and time to service is important for various applications (ice breakers, oil spills, etc.). Then once the “proper” product becomes available, switch to that one. It’s not too difficult to do this - simply use the product with later sciHubIngestion timestamp.
[1] On AWS we are not physically deleting NRT products for a couple of reasons:
people ingesting these data in their catalogues would have a hassle ensuring their catalogue is in sync;
implementing such check simply takes time on our side as well… with our work being done on a voluntary basis, we have to manage our priorities.
Sorry to bring this old thread back but I have a very relevant question:
Is there a way to determine that a product is NRT and there will be a subsequent non-NRT product soon to follow? For my use case, I could just ignore the NRT version and wait. I’ve combed through all the metadata on the full product details. All I could see that I thought was relevant is the Fast-24h value – but that same value exists on both products.