VectorToRasterTask polygon border values

riyazi · January 31, 2022, 2:22pm

Hello,

I am attempting to construct a crop type classification pipeline based on label data from the Zindi farm pin crop detection challenge.

I am able to download the necessary data for my Area of interest and I am able to rasterize the label data. The problem I have is the rasterized labels have incorrect border values as shown in the image below (You can see all polygon border values are random):

I have tried using buffer/erosion/ etc with no success. Following Simon Grest’s write up (GitHub - simongrest/farm-pin-crop-detection-challenge: 8th place solution to Zindi's FarmPin Crop Detection Challenge), he has no issues with rasterization although he used eo-learn v0.6.

The land cover data is a vector train.shp file and I rasterize it as follows:

land_cover_path = './train/test.shp'
land_cover_data = gpd.read_file(land_cover_path)
land_cover_data.dropna(inplace=True)
land_cover_data.to_crs({'init': 'EPSG:32734'},inplace=True)
land_cover_data['Crop_Id_Ne'] = land_cover_data.Crop_Id_Ne.astype('uint8')

rasterization_task = VectorToRasterTask(land_cover_data, (FeatureType.MASK_TIMELESS, 'LULC'),
                                    values_column='Crop_Id_Ne', raster_shape=(FeatureType.MASK, 'IS_DATA'),
                                    raster_dtype=np.uint8)

I am using EO-learn v0.10.1 installed using conda from the conda-forge channel.

I have no problems with the SI_LULC_pipeline example notebook so I’m not sure what could be wrong.

Does anyone have such experience when rasterizing a vector? Any help on how to resolve this issue will greatly be appreciated.

Many thanks.

maleksandrov · February 7, 2022, 7:40pm

Hi @riyazi,

I investigated your problem but I’m not able to reproduce it. I saw that the example is from upper left corner of the bounding box of the dataset ./data/train/train.shp from the GitHub repository. I ran the rasterization and there were no issues on the borders.

Is it possible that the way you plot the rasterized array produces errors and that the array itself is ok?

If this isn’t the case, then please provide more information that would allow us to fully reproduce the problem:

the exact bounding box you are using,
the exact size of raster image you produce,
which OS are you using,
are you using pip or conda based environment.

riyazi · February 8, 2022, 1:26pm

Hi Matej,

Many thanks for taking a look into this issue.

Yes, I reduced my AOI to only the top left portion of the original dataset. Although, I have also tried with other parts of the original shapefile and get similar outcomes. I haven’t been able to try the entire dataset as it it quite large for the current tests I am doing.

I tried following the workflow similar to the example in this notebook (https://github.com/sentinel-hub/eo-learn/tree/master/examples/crop-type-classification.) but I still get the same raster border problem.

I plot the raster field as follows:

fig, ax = plt.subplots(figsize=(20, 12))
im = ax.imshow(eopatch.mask_timeless['lulc_eroded'].squeeze(), cmap=lulc_cmap, norm=lulc_norm)
ax.set_aspect('auto')
ax.set_xticks(ticks=[])
ax.set_yticks(ticks=[])
cb = fig.colorbar(im, ax=fig.axes, orientation='horizontal', pad=0.03, aspect=100)
cb.ax.tick_params(labelsize=20) 
cb.set_ticks([entry.id for entry in LULC])
cb.ax.set_xticklabels([entry.class_name for entry in LULC], rotation=45, fontsize=15)

The exact bounding box for my most recent AOI is: 603390.7041615268,6808384.23351457,609319.2351862685,6811495.355191282
The size of the raster: not sure on the exact size, I split the AOI in to a 3x3 grid to get 9 patches.
The OS I am using is: Ubuntu 20.04
I am using a conda based environment. python-3.9.9 and eolearn-0.10.1 installed from conda-forge.

Thanks again for your help.

maleksandrov · February 9, 2022, 9:25pm

Thanks for providing more info. I can now confirm that the problem is in the plotting code and not in the rasterization process. An easy way to see this is to check unique values in the array you are plotting:

import numpy as np

np.unique(eopatch.mask_timeless['lulc_eroded'])

For your example above it should return only values array([0, 6, 8], dtype=uint8). Hence any other values were made up by matplotlib which is by default using interpolation for plotting. Therefore you can fix this by disabling the interpolation:

im = ax.imshow(
    eopatch.mask_timeless['lulc_eroded'].squeeze(),
    cmap=lulc_cmap,
    norm=lulc_norm,
    interpolation='none'
)

riyazi · February 10, 2022, 9:14am

Hi Matej,

Thank you so much for the solution, it was the interpolation that was causing the border value changes.