Non-NRT duplicate observations

I am working on a workflow that pulls every S1 images of a certain type (Specifically, IW, GRD, VV) in a certain region, in a given timeframe.

The current system queries the API through Scihub ( or APIHub (, gets all of the matching product names, and downloads them from AWS.

Problem is, there seem to be duplicates. Same capture date, same content, different ID. For the sake of examples, I’ll talk specifically about the capture S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E, but others are affected.

In this case, the duplicates are:

  • S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E_083F
    • with ID 4ed64f35-b0ba-4a15-83a1-d7e9e3e5b5d2
  • S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E_5146
    • with ID 0bbde3de-099c-498b-bf6b-7e9e7e9bebcb
  • S1A_IW_GRDH_1SDV_20220312T110121_20220312T110149_042287_050A5E_BA14
    • with ID ffac2db7-89ac-440a-8a7c-50ced39bccd0

I’ve read this other thread that concludes the duplicates are NRT products, but it doesn’t exactly match my symptoms. In particular:

  • Months old duplicates are still listed in apihub*1SDV_20220312T110121*)
  • From my understanding, non-NRT are supposed to replace NRT within 24 hours in the API (while staying in AWS). But those duplicates have an ingestion date 4 days after the capture.
    • 2022-03-12T12:15:12.011Z (1-2 hours after capture date beginPosition)
    • 2022-03-16T00:57:24.183Z
    • 2022-03-16T14:45:34.97Z
  • On the graphical ( interface, the two “newer” versions are flagged as “offline”, available only for async access, while the original product is available for synchronous access.

All three of them have, in their metadata, “Status: ARCHIVED” and “timeliness: Fast-24h”. It also doesn’t seem obvious that the first product is of “lesser quality”.

Is it intended? If yes, is there a way to know which products will have “improved” versions? While relatively trivial for older (2 months+) products, I don’t see anything clean for more recent captures.

Off the top of my head, the two solutions for my workflow are either

  1. Introduce a long delay (like 7 days) to ensure every product has had all of its versions released
  2. Ignore newer (and better?) releases of the same capture, if they eventually come out.

Thank you in advance for your help and insight.

(sorry for non-hypertext links, can’t post more than 2)

Hi @Oliver,

in terms of the data catalog I suggest you contact Copernicus support, as they have control over this:

We are “simply” downloading all the data from Copernicus Hubs to AWS. There is no clean-up process, so you should expect the duplicates also on AWS. But with use of meta-data you should be able to find the latest one.


Thank you @gmilcinski. I contacted Copernicus about the issue, and after a few back and forth, it boils down to the following:

The technical team informed us that there is an already open and known issue causing the duplication of products, investigation is ongoing, but this issue could happen again.

For other duplicated products you have to follow the below two options:

  • For data acquired before 6 April 2022, you can consider the last one generated
  • For data acquired after 6 April 2022, please let us know in order to contact the technical team which will check them and decide which one to be kept.

So there definitely is a bug somewhere that generates duplicate products, and it’s not always clear which one is “better”. I have not found that those duplicates are significantly different from one another, and any of them work just fine for our application, so I won’t bother too much with that, and just ignore the duplicates. From my point of view, this issue is completed.

For other people concerned about it, there you have your answer. Ignore duplicates, or contact the support for specific cases, bearing in mind it may take some days to get confirmation.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.