Data Cube and file system

Good morning,

as we trying to set a data cube based on the Open Data Cube software, I’d like to ask in what file system do you save your raster data in order to be more efficient during the load process.
Are you

  1. ingesting the files in a geo-db?
  2. parking the files in a simple NFS?
  3. parking the files in a distributed file system such as HDFS?
  4. parking the files in buckets?

Thanks in advance!

Hi @tdrivas,
note that we are not working with Open Data Cube software.
I suggest you contact them directly:
https://www.opendatacube.org/contacts
Best,
Grega

Dear Grega,
thanks for the reply. This was a question in a higher level not related with the used software on each scenario. Thus, it will be quite interesting to know the architecture of file storing in the EDC.
Thanks,
Thanassis

As a general rule, we try to not replicate any data, if already stored in the cloud and feasible for cloud-narive processing.
E.g. core mission data (Sentinel-1,2,3,…) is stored in original formats. We make use of COG-ified Sentinel-1 products (internal tiling, index, etc.), the rest is not changed at all.
In general we found Cloud Optimized GeoTIff, JP2, zarr and HDF5 as the most feasible data formats to work with.

Depending on what the follow-up processes require, there might sometimes be useful to pre-process the data to e.g. xcube or eo-patch, but these are typically only stored for the duration of the analysis.

Vector data are stored in geoDB, which is essentially cloud-hosted PostgreSQL/PostGIS database.

In terms of cloud storage, object storage like S3 or Swift work best in terms of scallability, based on our experience.