What is the best method to retrieve statistics from different indices for big table?

reutkeller · June 8, 2022, 10:45am

Hello ,

I have table with ~3000 polygon. I need to retrieve for each polygon different vegetation indices for ~6 months, when each polygon has different dates,and the area/bounding box of all the polygons is very large (more than 2500x2500). so the table looks similar to this:

polygon                   name    start_date        end_date
POLYGON ((....         plot1    2020-09-01    2021-02-18
POLYGON ((.....        plot2    2021-10-05    2022-03-13
...

My original methodology was to iterate through rows in my dataframe- each time take one polygon, get the specific dates, get the NDVI statistics, save it, repeat for next row ect… but this is super heavy and takes between 4 hours to 12.
The multiple request seems to be not relevant as each polygon has one date.
My question here is- what is the reason that it goes so slow? and what is the best methodology to use, as we have changing dates for each polygon? does batch processing using AWS is the only solution?

william.ray · June 8, 2022, 12:06pm

Hi Reut,

I assume that you are using Statistical API in your methodology? I’m not exactly sure what your methodology is but if I understand correctly you are already using Batch Statistical API?

reutkeller · June 9, 2022, 6:32am

Hi,
I’m using statistical API,
I haven’t used the batch statistical API for this

william.ray · June 9, 2022, 2:45pm

Then I suggest trying Batch Statistical API for this. It was developed exactly for this type of application with thousands of polygons!