Duplicate observations

Hi everyone,

I’m working on a pipeline to ingest the stream of SNS generated by the Sentinel-1 open data registry, and I thought I saw something odd… multiple observations that had what appeared to be the same footprint. I checked the geojsons, and sure enough they actually are identical.

Here’s an example:
S1B_EW_GRDH_1SDH_20200330T214541_20200330T214641_020927_027B16_DD6E
and
S1B_EW_GRDH_1SDH_20200330T214541_20200330T214641_020927_027B16_CE5C

I downloaded the quick-look.png’s, and they look exactly the same, and the productinfo.json’s are also almost identical. However, they have different SciHub ingestion data:
DD6E has “sciHubId” : “a5164b21-08e6-4ca0-92ac-836fdd4c4bbf”
CE5C has “sciHubId” : “2f0705f0-06e3-4bfc-8957-c4a494c782ab”

So then I dug into SciHub, BUT I can only find one object ingested that starts “S1B_EW_GRDH_1SDH_20200330T214541*”, and that is DD6E.
I can’t find CE5C at all.

Does anyone know what’s going on, and how I can avoid these duplicates? I’m seeing roughly 50 pairs a day!

Hi @jona,
Sentinel products are processed in various ground segment entities, which is why there are sometimes duplicates available. It might also be (not sure) that _CE5C is so called NRT product.
We are pulling the data from various sources in order to ensure complete archive, which is why you do not find the product on SciHub.

What we do in such case is to ingest both product and then take into account the one with later ingestion timestamp, as it is (shoud be) better in terms of quality.
Best,
Grega

Hi Grega,
Thank you so much for your quick reply.
I have gone ahead and downloaded both tiff measurements, and can now confirm they are identical in size and resolution.

Can you explain what other sources you use to ensure a complete archive? Both of these files were given a SciHubID in the SNS, so I would have expected them both to be in SciHub. Is it possible for you to look up observations by their SciHubID? I tried and could not figure it out.

What is an NRT product? And how is it different from a GRD?

Sorry for so many questions–trying to learn as fast as I can!

NRT = near real time, see
https://sentinel.esa.int/web/sentinel/missions/sentinel-1/data-products

We have contacted Copernicus Support desk with this specific request and will come back once we get clarification from them.

1 Like

Additional information:

I learned the SciHub API, and tried to directly download the two products by their SciHub IDs.

  • DD6E downloaded just fine
  • CE5C gave the following error

https://scihub.copernicus.eu/dhus/odata/v1/Products(‘2f0705f0-06e3-4bfc-8957-c4a494c782ab’)/$value

Invalid key (2f0705f0-06e3-4bfc-8957-c4a494c782ab) to access Products

We got confirmation from Copernicus Support that these products are indeed Near Real Time products, which are available on CopHub only, for a month, then they are deleted.
NRT products come faster but are usually of a bit lesser quality,.

@gmilcinski, thanks again for your reply!

We’ve implemented some checks on the system to make sure the files we are getting SNS notifications are also available on SciHub, and it’s surprising how many reported by Earth on AWS are not available on SciHub!

Over the course of a few minutes today, our system caught these exceptions as they came in from Sinergise’s SNS:

S1A_IW_GRDH_1SDV_20200406T043036_20200406T043101_032002_03B24F_ACCA
S1A_IW_GRDH_1SDV_20200406T043241_20200406T043306_032002_03B24F_BACC
S1A_IW_GRDH_1SDV_20200406T043306_20200406T043322_032002_03B24F_D5B5
S1B_EW_GRDM_1SDH_20200406T033540_20200406T033645_021018_027DE6_B9BA
S1B_EW_GRDM_1SDH_20200406T033645_20200406T033736_021018_027DE6_B944
S1B_EW_GRDM_1SDH_20200406T033843_20200406T033947_021018_027DE8_665C
S1B_EW_GRDM_1SDH_20200406T033947_20200406T034047_021018_027DE8_AB18
S1B_EW_GRDM_1SDH_20200406T034047_20200406T034147_021018_027DE8_6D7C
S1B_EW_GRDM_1SDH_20200406T034147_20200406T034255_021018_027DE8_E3AE
S1B_IW_GRDH_1SDV_20200406T034747_20200406T034816_021018_027DEA_85DC
S1B_IW_GRDH_1SDV_20200406T034816_20200406T034841_021018_027DEA_5042

If you try to search on the SciHub API for any of these, you will find no products from any missions with these names, and not even with these timestamps:
https://scihub.copernicus.eu/apihub/search?q=(filename:*20200406T034816*)

Our application will do fine, because it will throw away these SNS alerts, but we are curious…

  • Why are there so many measurements that disappear within 3 hours of being captured?! (This makes it seem like they are intended to be temporary, not NRT)
  • What percent of SNS alerts are for products that don’t exist?
  • Why does Earth on AWS store so many files that SciHub thinks should be deleted?

It’s not that these measurements dissapear. They are replaced[1] (on SciHub) with a non-NRT version, which has more thorough post-processing and QA in place.
Rather than throwing away these data, embrace it - NRT comes a couple of hours earlier and time to service is important for various applications (ice breakers, oil spills, etc.). Then once the “proper” product becomes available, switch to that one. It’s not too difficult to do this - simply use the product with later sciHubIngestion timestamp.

[1] On AWS we are not physically deleting NRT products for a couple of reasons:

  • people ingesting these data in their catalogues would have a hassle ensuring their catalogue is in sync;
  • implementing such check simply takes time on our side as well… with our work being done on a voluntary basis, we have to manage our priorities.

Best,
Grega

Sorry to bring this old thread back but I have a very relevant question:

Is there a way to determine that a product is NRT and there will be a subsequent non-NRT product soon to follow? For my use case, I could just ignore the NRT version and wait. I’ve combed through all the metadata on the full product details. All I could see that I thought was relevant is the Fast-24h value – but that same value exists on both products.