thanks for the reply. This was a question in a higher level not related with the used software on each scenario. Thus, it will be quite interesting to know the architecture of file storing in the EDC.
As a general rule, we try to not replicate any data, if already stored in the cloud and feasible for cloud-narive processing.
E.g. core mission data (Sentinel-1,2,3,…) is stored in original formats. We make use of COG-ified Sentinel-1 products (internal tiling, index, etc.), the rest is not changed at all.
In general we found Cloud Optimized GeoTIff, JP2, zarr and HDF5 as the most feasible data formats to work with.
Depending on what the follow-up processes require, there might sometimes be useful to pre-process the data to e.g. xcube or eo-patch, but these are typically only stored for the duration of the analysis.
Vector data are stored in geoDB, which is essentially cloud-hosted PostgreSQL/PostGIS database.
In terms of cloud storage, object storage like S3 or Swift work best in terms of scallability, based on our experience.