Linear Interpolation Problem in overlapping DataTake areas using WCS requests

wouellette · December 17, 2018, 11:25am

Hi,

Using the eo-learn framework, I came across some issues when sampling Sentinel-2 L1C in eopatches.

Image 1 in overlap area - 2018-11-21 07:22:24:

image 2 in overlap area - 2018-11-21 07:22:39:

As you can see above, it picks up two images on the same day, with a few seconds acquisition time difference. That is not surprising as it is an area where the two Data Takes overlap. Scihub illustrates the overlap:

So far it is understood. I am expecting these gaps present at certain timestamps to be removed when performing interpolation because I am interpolating 36 timestamp to 10 interpolated output values.

However, it turns out differently and two issues turn up:

First and last dates of the interpolated dates contain no values.
The second to last date still contains no data values in the horizontal strips at the bottom of the image.

Gif illustrating this point:

My take on this is that when performing WCS requests in those overlap areas and storing them into eopatches, there should be an automatic procedure which stitches these two together into a single timestamp to avoid the occurrence of these gaps, as they are from the same swath (i.e. redundant because they are showing the exact same data).

It would avoid any downstream errors when interpolating, because two almost identical timestamp will bias the interpolator.

In my particular use case of land cover classification, it is creating classification inconsistencies in output because of the absence of key time steps in the time series:

The background land cover is interpreted to be water just in that specific strip, and that’s the only location such a systematic bias happens in the entire extraction extent (Qatar in this case):

Let me know if you’ve come across such an issue. I’d be keen on knowing how we can get this fixed!

P.S: Why are we limited on links and images we can put… A point needs illustrating! Anyway, I put the links in there anyway so you can take a look.

anze.zupanc · December 17, 2018, 12:05pm

Hi William,

thanks for describing the issue in such detail.

At the moment I don’t have any explanation for the two interpolation issues that you mention. It requires more investigation. Would you be willing to share the meta info of the patch where you observe this behavior (bbox, time_interval, …)?

As for the overlaps and automatic stitching procedure: this is already available in/provided by Sentinel Hub services and exposed by sentinel hub-py. Basically, if you specify time_difference parameter in SentinelHub*Input Task, then all acquisitions with time stamps part for less then this time difference will be considered to be the same and you should get only one frame back. We usually set this parameter to be 2 hours. Can you try setting this parameter and check, if it solves your problems?

wouellette · December 17, 2018, 2:43pm

Hi Anze,

Thank for the swift reply!

Here are the relevant meta details for you to further investigate:
time interval = ['2018-06-01','2018-12-01']

3 eopatches are concerned:

EOPatch(
  data: {
    FEATURES: <class 'numpy.ndarray'>, shape=(10, 1339, 1368, 9), dtype=float32
    FEATURES_SAMPLED: <class 'numpy.ndarray'>, shape=(10, 1000, 1, 9), dtype=float32
  }
  mask: {
    IS_VALID: <class 'numpy.ndarray'>, shape=(27, 1339, 1368, 1), dtype=bool
    IS_VALID_SAMPLED: <class 'numpy.ndarray'>, shape=(27, 1000, 1, 1), dtype=bool
  }
  scalar: {}
  label: {}
  vector: {}
  data_timeless: {}
  mask_timeless: {
    LULC: <class 'numpy.ndarray'>, shape=(1339, 1368, 1), dtype=uint8
    LULC_SAMPLED: <class 'numpy.ndarray'>, shape=(1000, 1, 1), dtype=uint8
  }
  scalar_timeless: {}
  label_timeless: {}
  vector_timeless: {}
  meta_info: {}
  bbox: BBox(((527838.0784336411, 2789900.2618496483), (541513.1712241322, 2803295.097495106)), crs=EPSG:32639)
  timestamp: <class 'list'>, length=10
)

EOPatch(
  data: {
    FEATURES: <class 'numpy.ndarray'>, shape=(10, 1339, 1368, 9), dtype=float32
    FEATURES_SAMPLED: <class 'numpy.ndarray'>, shape=(10, 1000, 1, 9), dtype=float32
  }
  mask: {
    IS_VALID: <class 'numpy.ndarray'>, shape=(30, 1339, 1368, 1), dtype=bool
    IS_VALID_SAMPLED: <class 'numpy.ndarray'>, shape=(30, 1000, 1, 1), dtype=bool
  }
  scalar: {}
  label: {}
  vector: {}
  data_timeless: {}
  mask_timeless: {
    LULC: <class 'numpy.ndarray'>, shape=(1339, 1368, 1), dtype=uint8
    LULC_SAMPLED: <class 'numpy.ndarray'>, shape=(1000, 1, 1), dtype=uint8
  }
  scalar_timeless: {}
  label_timeless: {}
  vector_timeless: {}
  meta_info: {}
  bbox: BBox(((541513.1712241322, 2789900.2618496483), (555188.2640146231, 2803295.097495106)), crs=EPSG:32639)
  timestamp: <class 'list'>, length=10
)

EOPatch(
  data: {
    FEATURES: <class 'numpy.ndarray'>, shape=(10, 1339, 1368, 9), dtype=float32
    FEATURES_SAMPLED: <class 'numpy.ndarray'>, shape=(10, 1000, 1, 9), dtype=float32
  }
  mask: {
    IS_VALID: <class 'numpy.ndarray'>, shape=(30, 1339, 1368, 1), dtype=bool
    IS_VALID_SAMPLED: <class 'numpy.ndarray'>, shape=(30, 1000, 1, 1), dtype=bool
  }
  scalar: {}
  label: {}
  vector: {}
  data_timeless: {}
  mask_timeless: {
    LULC: <class 'numpy.ndarray'>, shape=(1339, 1368, 1), dtype=uint8
    LULC_SAMPLED: <class 'numpy.ndarray'>, shape=(1000, 1, 1), dtype=uint8
  }
  scalar_timeless: {}
  label_timeless: {}
  vector_timeless: {}
  meta_info: {}
  bbox: BBox(((541513.1712241322, 2789900.2618496483), (555188.2640146231, 2803295.097495106)), crs=EPSG:32639)
  timestamp: <class 'list'>, length=10
)

anze.zupanc · December 17, 2018, 8:06pm

Thanks for the extra info. BTW, have you tried using time_difference parameter in the input task?

anze.zupanc · December 18, 2018, 9:05am

Can you send the configuration details of Interpolation task as well? Start date, end date? The reason why the first and the last interpolated frames are all nan is because the interpolated date is before/after any valid observation.

wouellette · December 18, 2018, 9:39am

Hi Anze,

The second problem with the interpolation gap is fixed using time_difference. Thanks for that one!

The first and last dates from the interpolated timesteps remain missing because I have used the same start_date and end_date as for the WCS request, but of course the likelihood that an image is available on the first and last day of the time_interval is highly unlikely. So I fixed that and made the time interval for interpolation match the first and last patch.timestamp.

Thanks for the support!

wouellette · December 18, 2018, 10:04am

Actually, here is a pull request to do this as part of interpolation.py because I do not foresee anyone needing empty data before/after the interpolation range they specified.

The pull request: start_date/end_date matched to timestamp #44

anze.zupanc · December 18, 2018, 10:31am

Hi William,

I’m glad that the issue with interpolation gap is fixed now.

Regarding the issue with first/last interpolated timestamp being nan I agree that your PR circumvents the problem technically, but conceptually it may not be the best solution. The problem is that now each patch will have the first/last frame interpolated to a different date. When you then combine multiple patches the first/last frames in the eopatches don’t represent conceptually the same information. I’m not sure how large an effect is this in practice, but still in certain cases it may lead to large differences.

Therfore I would suggest to you to adjust the start/end date in the interpolation task to dates for which you know that are after/before the first/last observation in the majority (if not all) eopatches.

wouellette · December 18, 2018, 11:31am

Absolutely right on your part. The problem will be especially occurring with very large AOIs covering a wide variety of swaths.

For the sake of sharing the solution I found outside of eo-learn, hoping it can be useful to someone, I am simply initializing start_date_interp and end_date_interp with the provided start_date and end_date used for Sentinelhub*WCSInput(), and overwriting them at each eopatch iteration if they happen to be after/before the previously stored timestamps. This will crop my time interval to the minimum common interval for the AOI.

The solution would look something like this, given my eopatches have been stored in a list() and start_date and end_date are the dates specified in the time_interval for Sentinelhub*WCSInput():

start_date_interp = parser.parse(start_date)
end_date_interp = parser.parse(end_date)

for eopatch in eopatches:
     if start_date_interp < eopatch.timestamp[0]:
         start_date_interp = eopatch.timestamp[0]
     if end_date_interp > eopatch.timestamp[-1]:
         end_date_interp = eopatch.timestamp[-1]

resample_range = (start_date_interp, end_date_interp, interp_interval)

wouellette · March 22, 2019, 8:34am

Another issue closely linked to this thread is that when I perform Interpolation with resample_range as a tuple in a form of (start_date, end_date, step_days) over many eopatches spanning a very large geogrpahic area, the resampled time dimension varies in length, I guess due to varying availability of imagery at a given location.

Over 4909 patches, I get 5 which have been resampled with 23 dates, and the rest with 21. So it’s very few, but enough to cause issue for further processing. Any way the procedure could include a way to force the time dimension to be equal for all patches in bbox_list?

devis.peressutti · March 28, 2019, 8:46am

Hi William,

Sorry for the late response. It does seem odd that the interpolation returns a different number of time-frames. In general it should pre-compute the new number of time frames and fill with Nans dates where images are not available. If you could provide us with more details (timestamps of a 23 and 21 dates patches), and possible BBoxes we can try debug the task.

Thanks.

wouellette · March 28, 2019, 4:18pm

It’s an artefact of the method I am using in this post:
https://shforum.sinergise.com/t/linear-interpolation-problem-in-overlapping-datatake-areas-using-wcs-requests/698/9?u=wouellette

Because I simply store the dates into variables, let’s say I rerun on a subset and not on the entirety of the AOI, the start_date_interp and end_date_interp will be fitted to the subset eopatches and not to ALL eopatches, which may result in a different date range.

I am logging those dates now in an AOI metadata file so that I can re-use the very same bounded dates on any subsequent runs over parts of the AOI.

Sorry for raising this non-issue

William