I have been working through the LULC python pipeline that uses a combination of eo-learn and sentinelhub.
Does anyone have recommendations to reduce using processing units while testing the code?
While learning these tools I chose an area about 1000 square km and after two attempts with the pipeline code have used 20,000 processing units and have no idea where the EOpatches are being saved. Something is being processed and it takes about 30 minutes to run, but after there are no ‘results’ and the output folder created remains empty. In the error report all executions failed with:
KeyError: “During execution of task VectorToRaster: ‘lulcid’”
I am looking to use google colab, so that may be an additional hurdle.
I am not concerned about fixing a specific problem, but rather generally avoiding burning through a month of processing units with mistakes when implementing on a large scale.
Is there a way to prompt the user during a workflow like the example for land use in Slovenia or other repetitive tasks using large amounts of processing units to stop after an error in the first execution batch? Requests and mosaicking work fine for my AOI. Are significantly more processing units needed when splitting the same area first with the bboxsplitter?
You are addressing a couple of questions.
For reducing the consumption you can simply choose smaller area or shorter time period. Or perhaps reduce the allowed cloud coverage a bit, to filter out the most cloudy scenes.
That being said, if you want to run larger ML loads, we recommend using Batch processing, which was desing to fit this purpose and is optimised for it - you will consume three times less processing units. See an example eo-learn workflow with Batch described in this blog post:
Note that you will need an enterprise package to use Batch processing, but if you want to give it a try, send us an e-mail and we will temporarily configure your account accordingly.
Most of the exemplery eo-learn tasks first fetch EOPatches and store them to your local disk, then use this every time you run the code. And no, there should be no more processing units consumed if you split the area or not. Processing units are simply calculated based on the total volume of data you consume. See this doc for definition:
Thank you for the response…I am going to keep getting to know the tools on a basic level before moving towards larger tasks.
As far as processing units, it looks like google colab deletes the files stored in a session (response.tiff) and the sentinelhub request starts again from 0 when the notebook is reopened. I am going through the documentation now to modify the the pipeline to restart from the tiff and will repost if I find a solution.