Hi all, S3 Inventory has been turned on for all the Sentinel buckets available via the AWS Public Datasets program. This includes sentinel-s1-l1c, sentinel-s2-l1c, sentinel-s2-l2a. The S3 bucket that contains these inventory files is sentinel-inventory and is in the eu-central-1 region.
These inventory files will provide, on a daily basis, the inventory of all files in the buckets along with their size and last updated time. More information on S3 Inventory files here.
@jflasher I’m trying to use the s3 inventory files to catalog the archive and have the manifest.json downloaded. How do I know which csv.gz is the latest file to iterate over? Is each csv.gz an entire listing of the archive or just a diff? How would I iterate over the entire archive?
@jflasher yep I see all the csvs for that day. If I iterate over the csv’s listed under sentinel-s2-l1c/sentinel-s2-l1c-inventory/2018-09-11T08-00Z/manifest.json, would I get the entire archive? Particularly all the productInfo.json files is what I’m looking for.
See our catalog is missing the productPath for on each tile record.
For example (u’products/2016/10/18/S2A_OPER_PRD_MSIL1C_PDMC_20161019T103543_R050_V20161018T095052_20161018T095052’)
and I’m looking to regenerate all of them. The productInfo.json contains the information I need I just wanted to confirm that 1 manifest contains the entire archive.
How do I know what the latest inventory file is? I try to send a request to access: sentinel-s2-l1c/sentinel-s2-l1c-inventory/2020-22-01T08-00Z/manifest.json but it says this key doesn’t exist.
I noticed the HH part of hte date changes, how do you know what the hour was that the manifest.json file was created?
Hi, your key format should look like sentinel-inventory/sentinel-s2-l1c/sentinel-s2-l1c-inventory/2020-01-22T04-00Z/manifest.json rather than what you posted. However, your point about the HH changing in the key is correct. I would recommend listing all the keys in the prefix sentinel-inventory/sentinel-s2-l1c/sentinel-s2-l1c-inventory/ which will get you a list of date keys and sorting them for the latest client side (via JS, Python, some gnarly bash scripting, etc.).