CLM-error in eoLearn-slovenia Land Cover Classification script

bgumwelt · June 7, 2020, 5:45pm

Hi SenHub-Team,
I tried the eoLearn-slovenia Land Cover Classification script.
I have a problem with download the patches. Once it download the data for slovenia. After that I get only error messages for slovenia and in this case my area:
It has Problems with the CLM.

Has something change after?:
https://shforum.sinergise.com/t/cloud-masks-available-as-clm-clp-band/2113

What can I do?

Thank you very much for your work.

Greetz,
Kevin

max.kampen · June 10, 2020, 7:28am

Hi Kevin (@bgumwelt),

we are really happy to see that you are using our eo-learn tutorial. Just to clarify, you experienced the reported CLM problem when running the tutorial unchanged as well as with a different area of interest?
I will check in with our research team to figure out what is going wrong here.

Best, Max

max.kampen · June 10, 2020, 8:34am

Hi again Kevin (@bgumwelt),

Could you please make sure to use our latest LULC example notebook? @matic.lubej has made some changes a month ago, and it should be working.

From the error itself it seems that at some point in your workflow, CLM mask is not available. Although from the screenshot it doesn’t seem so, but perhaps the order of your tasks is not correct (and one task is trying to use a feature that doesn’t exist yet).

Please check your code with respect to latest eo-learn version. If you continue to have issues, please let us know.

Best regards,
Max

bgumwelt · June 14, 2020, 6:24pm

Hi Max,

thank you for your reply.
I used your link for the latest LULC example notebook (https://github.com/sentinel-hub/eo-learn/blob/master/examples/land-cover-map/SI_LULC_pipeline.ipynb) and tried it again. The Error is the same.
KeyError: “During execution of task AddValidDataMaskTask: ‘CLM’”
In the following text is the Jupyter-Notbook and the error-file:

**Jupyter-Notebook:**


# Firstly, some necessary imports

# Jupyter notebook related
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# Built-in modules
import pickle
import sys
import os as os
import datetime
import itertools
from aenum import MultiValueEnum

# Basics of Python data handling and visualization
import numpy as np
np.random.seed(42)
import geopandas as gpd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from matplotlib.colors import ListedColormap, BoundaryNorm
from mpl_toolkits.axes_grid1 import make_axes_locatable
from shapely.geometry import Polygon
from tqdm.auto import tqdm

# Machine learning 
import lightgbm as lgb
#from sklearn.externals import joblib
#from sklearn import metrics
#from sklearn import preprocessing

# Imports from eo-learn and sentinelhub-py
from eolearn.core import EOTask, EOPatch, LinearWorkflow, FeatureType, OverwritePermission, \
    LoadTask, SaveTask, EOExecutor, ExtractBandsTask, MergeFeatureTask
from eolearn.io import SentinelHubInputTask, ExportToTiff
from eolearn.mask import AddMultiCloudMaskTask, AddValidDataMaskTask
from eolearn.geometry import VectorToRaster, PointSamplingTask, ErosionTask
from eolearn.features import LinearInterpolation, SimpleFilterTask, NormalizedDifferenceIndexTask
from sentinelhub import UtmZoneSplitter, BBox, CRS, DataSource,SentinelHubRequest

# Folder where data for running the notebook is stored
#DATA_FOLDER = os.path.join('..', '..', 'example_data')
DATA_FOLDER = os.path.join('D:/eoTest/example_data')
print('DATA_FOLDER: ',DATA_FOLDER)
# Load geojson file
#country = gpd.read_file(os.path.join(DATA_FOLDER, 'svn.geojson'))#svn_utm_33N
#country = gpd.read_file(os.path.join(DATA_FOLDER, 'svn_3857.geojson'))#svn_utm_33N
country = gpd.read_file(os.path.join(DATA_FOLDER, 'svn_utm_33N.geojson'))#svn_utm_33N

country = country.buffer(500)

# Get the country's shape in polygon format
country_shape = country.geometry.values[-1]

# Plot country
country.plot()
plt.axis('off');

# Print size 
print('Dimension of the area is {0:.0f} x {1:.0f} m2'.format(country_shape.bounds[2] - country_shape.bounds[0],
                                                             country_shape.bounds[3] - country_shape.bounds[1]))

DATA_FOLDER:  D:/eoTest/example_data
Dimension of the area is 243184 x 161584 m2

# Create the splitter to obtain a list of bboxes
bbox_splitter = UtmZoneSplitter([country_shape], country.crs, 5000)

bbox_list = np.array(bbox_splitter.get_bbox_list())
info_list = np.array(bbox_splitter.get_info_list())

# Prepare info of selected EOPatches
geometry = [Polygon(bbox.get_polygon()) for bbox in bbox_list]
idxs = [info['index'] for info in info_list]
idxs_x = [info['index_x'] for info in info_list]
idxs_y = [info['index_y'] for info in info_list]

gdf = gpd.GeoDataFrame({'index': idxs, 'index_x': idxs_x, 'index_y': idxs_y}, 
                           crs=country.crs, 
                           geometry=geometry)

# select a 5x5 area (id of center patch)
ID = 616

# Obtain surrounding 5x5 patches
patchIDs = [616]
'''
for idx, [bbox, info] in enumerate(zip(bbox_list, info_list)):
    if (abs(info['index_x'] - info_list[ID]['index_x']) <= 2 and
        abs(info['index_y'] - info_list[ID]['index_y']) <= 2):
        patchIDs.append(idx)

# Check if final size is 5x5
if len(patchIDs) != 5*5:
    print('Warning! Use a different central patch ID, this one is on the border.')
    
# Change the order of the patches (used for plotting later)
patchIDs = np.transpose(np.fliplr(np.array(patchIDs).reshape(5, 5))).ravel()
'''

# save to shapefile
shapefile_name = (os.path.join(DATA_FOLDER, 'grid_slovenia_500x500.gpkg'))
# save to shapefile
#shapefile_name = './grid_slovenia_500x500.gpkg'
gdf.to_file(shapefile_name, driver='GPKG')

# figure
fig, ax = plt.subplots(figsize=(30, 30))
gdf.plot(ax=ax,facecolor='w',edgecolor='r',alpha=0.5)
country.plot(ax=ax, facecolor='w',edgecolor='b',alpha=0.5)
ax.set_title('Selected 5x5  tiles from Slovenia', fontsize=25);
for bbox, info in zip(bbox_list, info_list):
    geo = bbox.geometry
    ax.text(geo.centroid.x, geo.centroid.y, info['index'], ha='center', va='center')
    
gdf[gdf.index.isin(patchIDs)].plot(ax=ax,facecolor='g',edgecolor='r',alpha=0.5)

plt.axis('off');

class SentinelHubValidData:
    """
    Combine Sen2Cor's classification map with `IS_DATA` to define a `VALID_DATA_SH` mask
    The SentinelHub's cloud mask is asumed to be found in eopatch.mask['CLM']
    """
    def __call__(self, eopatch):        
        return np.logical_and(eopatch.mask['IS_DATA'].astype(np.bool), 
                              np.logical_not(eopatch.mask['CLM'].astype(np.bool)))
    
class CountValid(EOTask):   
    """
    The task counts number of valid observations in time-series and stores the results in the timeless mask.
    """
    def __init__(self, count_what, feature_name):
        self.what = count_what
        self.name = feature_name
        
    def execute(self, eopatch):
        eopatch.add_feature(FeatureType.MASK_TIMELESS, self.name, np.count_nonzero(eopatch.mask[self.what],axis=0))
        
        return eopatch

# TASK FOR BAND DATA
# add a request for S2 bands
# Here we also do a simple filter of cloudy scenes (on tile level)
# s2cloudless masks and probabilities are requested via additional data
band_names = ['B02', 'B03', 'B04', 'B08', 'B11', 'B12']
add_data = SentinelHubInputTask(
    bands_feature=(FeatureType.DATA, 'BANDS'),
    bands = band_names,
    resolution=10,
    maxcc=0.8,
    time_difference=datetime.timedelta(minutes=120),
    data_source=DataSource.SENTINEL2_L1C,
    additional_data=[(FeatureType.MASK, 'dataMask', 'IS_DATA'),
                     (FeatureType.MASK, 'CLM'),
                     (FeatureType.DATA, 'CLP')])


# TASKS FOR CALCULATING NEW FEATURES
# NDVI: (B08 - B04)/(B08 + B04)
# NDWI: (B03 - B08)/(B03 + B08)
# NDBI: (B11 - B08)/(B11 + B08)
ndvi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'BANDS'), (FeatureType.DATA, 'NDVI'), 
                                     [band_names.index('B08'), band_names.index('B04')])
ndwi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'BANDS'), (FeatureType.DATA, 'NDWI'), 
                                     [band_names.index('B03'), band_names.index('B08')])
ndbi = NormalizedDifferenceIndexTask((FeatureType.DATA, 'BANDS'), (FeatureType.DATA, 'NDBI'), 
                                     [band_names.index('B11'), band_names.index('B08')])



# TASK FOR VALID MASK
# validate pixels using SentinelHub's cloud detection mask and region of acquisition 
add_sh_valmask = AddValidDataMaskTask(SentinelHubValidData(), 
                                      'IS_VALID' # name of output mask
                                     )

# TASK FOR COUNTING VALID PIXELS
# count number of valid observations per pixel using valid data mask 
count_val_sh = CountValid('IS_VALID', # name of existing mask
                          'VALID_COUNT' # name of output scalar
                         )
#path_out = DATA_FOLDER
path_out = os.path.join('D:/eoTest/example_data/test')
# TASK FOR SAVING TO OUTPUT (if needed)
path_out = './eopatches/'
if not os.path.isdir(path_out):
    os.makedirs(path_out)
save = SaveTask(path_out, overwrite_permission=OverwritePermission.OVERWRITE_PATCH)

class LULC(MultiValueEnum):
    """ Enum class containing basic LULC types
    """
    NO_DATA            = 'No Data',            0,  '#ffffff'
    CULTIVATED_LAND    = 'Cultivated Land',    1,  '#ffff00'
    FOREST             = 'Forest',             2,  '#054907'
    GRASSLAND          = 'Grassland',          3,  '#ffa500'
    SHRUBLAND          = 'Shrubland',          4,  '#806000'
    WATER              = 'Water',              5,  '#069af3'
    WETLAND            = 'Wetlands',           6,  '#95d0fc'
    TUNDRA             = 'Tundra',             7,  '#967bb6'
    ARTIFICIAL_SURFACE = 'Artificial Surface', 8,  '#dc143c'
    BARELAND           = 'Bareland',           9,  '#a6a6a6'
    SNOW_AND_ICE       = 'Snow and Ice',       10, '#000000'
    
    @property
    def id(self):
        """ Returns an ID of an enum type

        :return: An ID
        :rtype: int
        """
        return self.values[1]

    @property
    def color(self):
        """ Returns class color

        :return: A color in hexadecimal representation
        :rtype: str
        """
        return self.values[2]


def get_bounds_from_ids(ids):
    bounds = []
    for i in range(len(ids)):
        if i < len(ids) - 1:
            if i == 0:
                diff = (ids[i + 1] - ids[i]) / 2
                bounds.append(ids[i] - diff)
            diff = (ids[i + 1] - ids[i]) / 2
            bounds.append(ids[i] + diff)
        else:
            diff = (ids[i] - ids[i - 1]) / 2
            bounds.append(ids[i] + diff)
    return bounds
    

# Reference colormap things
lulc_bounds = get_bounds_from_ids([x.id for x in LULC])
lulc_cmap = ListedColormap([x.color for x in LULC], name="lulc_cmap")
lulc_norm = BoundaryNorm(lulc_bounds, lulc_cmap.N)

# takes some time due to the large size of the reference data
land_use_ref_path = os.path.join(DATA_FOLDER, 'land_use_10class_reference_slovenia_partial.gpkg')
land_use_ref = gpd.read_file(land_use_ref_path)

rasterization_task = VectorToRaster(land_use_ref, (FeatureType.MASK_TIMELESS, 'LULC'),
                                    values_column='lulcid', raster_shape=(FeatureType.MASK, 'IS_DATA'),
                                    raster_dtype=np.uint8)

# Define the workflow
workflow = LinearWorkflow(
    add_data,
    ndvi,
    ndwi,
    ndbi,
    add_sh_valmask,
    count_val_sh,
    rasterization_task,
    save
)

# Let's visualize it
workflow.dependency_graph()

SentinelHubInputTask NormalizedDifferenceIndexTask NormalizedDifferenceIndexTask_1 NormalizedDifferenceIndexTask_2 AddValidDataMaskTask CountValid VectorToRaster SaveTask

%%time

# Execute the workflow
time_interval = ['2019-01-01', '2019-12-31'] # time interval for the SH request

# define additional parameters of the workflow
execution_args = []
for idx, bbox in enumerate(bbox_list[patchIDs]):
    execution_args.append({
        add_data:{'bbox': bbox, 'time_interval': time_interval},
        save: {'eopatch_folder': f'eopatch_{idx}'}
    })
    
executor = EOExecutor(workflow, execution_args, save_logs=True)
executor.run(workers=12, multiprocess=False)

executor.make_report()

C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\features\bands_extraction.py:86: RuntimeWarning: invalid value encountered in true_divide
  ndi = (band_a - band_b + self.acorvi_constant) / (band_a + band_b + self.acorvi_constant)

Wall time: 39.6 s

 
__________________________________________________________________________________
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
________________________________Error-file_____________________________________________

**Error-file:**
 Execution status

    Start time: 19:33:33 06/14/20
    End time: 19:34:08 06/14/20
    Duration: 0:00:35.149022
    Number of finished executions: 0
    Number of failed executions: 1
    Processing type: multithreading
    Number of workers: 12

 ... Execution successfully finished
 ... Execution failed because of an error
EOTasks
Initialization parameters
SentinelHubInputTask (SentinelHubInputTask_e35078)

     data_source = <DataSource.SENTINEL2_L1C: (<_Source.SENTINEL2: 'Sentinel-2'>, <_ProcessingLevel.L1C: 'L1C'>)> 

     resolution = 10 

     bands_feature = (<FeatureType.DATA: 'data'>, 'BANDS') 

     bands = ['B02', 'B03', 'B04', 'B08', 'B11', 'B12'] 

     additional_data = [(<FeatureType.MASK: 'mask'>, 'dataMask', 'IS_DATA'), (<FeatureType.MASK: 'mask'>, 'CLM'), (<FeatureType.DATA: 'data'>, 'CLP')] 

     maxcc = 0.8 

     time_difference = datetime.timedelta(seconds=7200) 

NormalizedDifferenceIndexTask (NormalizedDifferenceIndexTask_1e4fb1)

     input_feature = (<FeatureType.DATA: 'data'>, 'BANDS') 

     output_feature = (<FeatureType.DATA: 'data'>, 'NDVI') 

     bands = [3, 2] 

NormalizedDifferenceIndexTask_1 (NormalizedDifferenceIndexTask_c0c30a)

     input_feature = (<FeatureType.DATA: 'data'>, 'BANDS') 

     output_feature = (<FeatureType.DATA: 'data'>, 'NDWI') 

     bands = [1, 3] 

NormalizedDifferenceIndexTask_2 (NormalizedDifferenceIndexTask_f962b4)

     input_feature = (<FeatureType.DATA: 'data'>, 'BANDS') 

     output_feature = (<FeatureType.DATA: 'data'>, 'NDBI') 

     bands = [4, 3] 

AddValidDataMaskTask (AddValidDataMaskTask_79928d)

     predicate = <__main__.SentinelHubValidData object at 0x000001EA03892AC8> 

     valid_data_feature = 'IS_VALID' 

CountValid (CountValid_4ed3eb)

     count_what = 'IS_VALID' 

     feature_name = 'VALID_COUNT' 

VectorToRaster (VectorToRaster_55e961)

     vector_input =           RABA_PID  RABA_ID         VIR         AREA STATUS        D_OD  \
    0        4943120.0     1100        Dof5     438.1625      P  2018-01-09   
    1        4943179.0     1222        Dof5     990.8498      P  2018-01-31   
    2        1089657.0     2000        Dof5    3177.1290      P  2018-01-16   
    3        4943187.0     1410        Dof5    4982.0694      P  2017-09-11   
    4        4943231.0     3000        Dof5     195.1148      P  2017-09-11   
    ...            ...      ...         ...          ...    ...         ...   
    1569740  1621084.0     3000  Baseline_2  142552.0158      P  2018-01-21   
    1569741  5844925.0     1221        Dof5   27128.4164      P  2018-02-22   
    1569742  4993833.0     2000        Dof5  335540.0330      P  2016-02-04   
    1569743  5480086.0     1300        Dof5   35382.2010      P  2018-05-14   
    1569744  1590504.0     5000  Baseline_2   13160.2100      P  2018-02-01   

             lulcid            lulcname  \
    0             1     cultivated land   
    1             1     cultivated land   
    2             2              forest   
    3             4          schrubland   
    4             8  artificial surface   
    ...         ...                 ...   
    1569740       8  artificial surface   
    1569741       1     cultivated land   
    1569742       2              forest   
    1569743       3           grassland   
    1569744       4          schrubland   

                                                      geometry  
    0        MULTIPOLYGON (((394793.882 5040217.190, 394792...  
    1        MULTIPOLYGON (((394572.984 5040401.611, 394568...  
    2        MULTIPOLYGON (((417562.359 5124368.001, 417559...  
    3        MULTIPOLYGON (((394595.552 5040470.227, 394598...  
    4        MULTIPOLYGON (((394591.353 5040468.660, 394595...  
    ...                                                    ...  
    1569740  MULTIPOLYGON (((437398.343 5131241.527, 437398...  
    1569741  MULTIPOLYGON (((446875.197 5073635.445, 446846...  
    1569742  MULTIPOLYGON (((540867.718 5095500.025, 540865...  
    1569743  MULTIPOLYGON (((528341.088 5138758.297, 528325...  
    1569744  MULTIPOLYGON (((400184.696 5147192.187, 400159...  

    [1569745 rows x 9 columns] 

     raster_feature = (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'LULC') 

SaveTask (SaveTask_15bd1e)

     path = './eopatches/' 

Source code of custom tasks
CountValid (__main__)

Cannot collect source code of a task which is not defined in a .py file
Execution details
Execution 1
Statistics

    Start time: 19:33:33 06/14/20
    End time: 19:34:08 06/14/20
    Duration: 0:00:35.140987

Error

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

	

Traceback (most recent call last):
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eotask.py", line 72, in _execute_handling
    return_value = self.execute(*eopatches, **kwargs)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\mask\masking.py", line 46, in execute
    eopatch[feature_type][feature_name] = self.predicate(eopatch)
  File "<ipython-input-36-58e490e3527e>", line 8, in __call__
    np.logical_not(eopatch.mask['CLM'].astype(np.bool)))
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eodata.py", line 664, in __getitem__
    value = super().__getitem__(feature_name)
KeyError: 'CLM'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eoexecution.py", line 192, in _execute_workflow
    results = workflow.execute(input_args, monitor=True)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eoworkflow.py", line 172, in execute
    results = WorkflowResults(self._execute_tasks(input_args=input_args, out_degs=out_degs, monitor=monitor))
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eoworkflow.py", line 210, in _execute_tasks
    monitor=monitor)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eoworkflow.py", line 243, in _execute_task
    return task(*inputs, **kw_inputs, monitor=monitor)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eotask.py", line 59, in __call__
    return self._execute_handling(*eopatches, **kwargs)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eotask.py", line 85, in _execute_handling
    raise extended_exception.with_traceback(traceback)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eotask.py", line 72, in _execute_handling
    return_value = self.execute(*eopatches, **kwargs)
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\mask\masking.py", line 46, in execute
    eopatch[feature_type][feature_name] = self.predicate(eopatch)
  File "<ipython-input-36-58e490e3527e>", line 8, in __call__
    np.logical_not(eopatch.mask['CLM'].astype(np.bool)))
  File "C:\Users\BGU_Admin\Anaconda3\lib\site-packages\eolearn\core\eodata.py", line 664, in __getitem__
    value = super().__getitem__(feature_name)
KeyError: "During execution of task AddValidDataMaskTask: 'CLM'"

matic.lubej · June 15, 2020, 7:45am

Hi @bgumwelt!

Thanks for your feedback. Now we know that the notebook isn’t the problem. I ran the notebook just yesterday and things worked as expected, so the problem might lie somewhere else.

Unfortunately, it is not enough that the notebook is up to date, the whole eolearn package should be updated. From your output, I can see that you are running on Windows and via Anaconda. May I ask how have you installed eolearn?

if you have downloaded it via git, then please go to the eolearn repository that you have created and execute git pull in order to update eolearn.
in case you didn’t use git, you probably just downloaded the package straight from our GitHub. In that case, please download the latest version from here again.

Then you just have to reinstall the package in Anaconda.

Unfortunately, this code is alive and needs to be updated/reinstalled quite often when changes take place. In case you are still having problems after that , we can resign to some remote-styled help, if necessary.

Cheers,
Matic

bgumwelt · June 15, 2020, 7:27pm

Hi Matic,

great reply!
Thank you very much. The new installation worked fine!
Now there is a eo-learn version mix, 0.7.4 & 0.7.3:
grafik
Is this correct?

bgumwelt · June 16, 2020, 5:56am

Hi Matic,
what means this error and how can I solve it

Greetz,
Kevin

matic.lubej · June 16, 2020, 7:48am

Hi Kevin,

I’m glad that we have found the solution. The mix of packages is expected, they are updated separately between each release, but get synced when we do the releases.

Regarding your second comment:

“A Jupyter widget could not be displayed …”

Here a progress bar should be shown, but it seems that your installation is missing the widgets. Are you using Jupyter Notebook or Jupyter Lab? This is not problematic, just a nuisance.

And the last one:

RuntimeWarning: invalid value encountered in true_divide

This is just a warning, not an error. It happens because some band values can be outside of the borders of the satellite swath. In those cases, the band values can be 0 or NaN and the normalized index calculation produces a warning when this occurs, but the calculation should go through normally and you should expect results as usual.

You can then later mask these cases out by using the CLM or the IS_DATA bands.

Cheers,
Matic

bgumwelt · June 23, 2020, 6:49pm

Hello Matic,

thank you very much for your helpfull replies.

I have a new error at downloading the patches. Some patches have no problems and some got error.

sentinelhub.exceptions.DownloadFailedException: During execution of task SentinelHubInputTask: Failed to download from:
https://services.sentinel-hub.com/api/v1/process
with ConnectionError:
HTTPSConnectionPool(host=‘services.sentinel-hub.com’, port=443): Max retries exceeded with url: /api/v1/process (Caused by NewConnectionError(‘<urllib3.connection.HTTPSConnection object at 0x000001775AF334C8>: Failed to establish a new connection: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat’))
Please check your internet connection and try again.

eoexecution-report-2020_06_22-23_00_01/report.html:

matic.lubej · June 24, 2020, 7:33am

Hi Kevin,

I think that my choice of some of the parameters in the example notebook was a bit too extreme, as it tries to download the data with too many instances in parallel. I already fixed this in the develop version on the repository, but you can just change the parameters yourself.

The data which you downloaded should be fine and in principle you could rerun the download and after some iterations you would have the whole dataset. Unfortunately the choice of these parameters is not optimal for all machines.

In the SentinelHubInputTask try setting the max_threads parameter to 5 or less. i.e.:

add_data = SentinelHubInputTask(
    bands_feature=(FeatureType.DATA, 'BANDS'),
    bands = band_names,
    resolution=10,
    maxcc=0.8,
    time_difference=datetime.timedelta(minutes=120),
    data_source=DataSource.SENTINEL2_L1C,
    additional_data=[(FeatureType.MASK, 'dataMask', 'IS_DATA'),
                     (FeatureType.MASK, 'CLM'),
                     (FeatureType.DATA, 'CLP')],
    max_threads=5
)

And later when you run the workflow with the EOExecutor, again change the number of workers to 5 or less and also set the multiprocess parameter to True (otherwise it uses multithreading), i.e.:

executor = EOExecutor(workflow, execution_args, save_logs=True)
executor.run(workers=5, multiprocess=True)

Hopefully these settings will be friendlier to your machine.

Cheers,
Matic

rim.sleimi · June 24, 2020, 9:39am

Then wouldn’t make more sense to delete those NAN/ 0 values before proceding with any type of processing?
Also I noticed in the Notebooks that NDVI, NDWI… are calculated before cloud masking is applied. Although when visualizing the some of the Sentinel 2A images (true color) I noticed the presence of clouds (after masking) which basically means that NDVI, and NDWI were calculated based on the cloudy images. In such case reflectance values of vegetation are not captured in cloudy areas and thus NDVI values are not the real ones, right?

matic.lubej · June 24, 2020, 10:15am

Hi @rim.sleimi,

I don’t think this has to do anything with this issue. @bgumwelt had some issues downloading the data, not with data being NAN/0. Or am I missing something here?

Also I noticed in the Notebooks that NDVI, NDWI… are calculated before cloud masking is applied. Although when visualizing the some of the Sentinel 2A images (true color) I noticed the presence of clouds (after masking) which basically means that NDVI, and NDWI were calculated based on the cloudy images. In such case reflectance values of vegetation are not captured in cloudy areas and thus NDVI values are not the real ones, right?

This all just depends on your workflow. You can calculate NDVI on the whole image, clouds included, but the NDVI values there will not be valid. This is why you can then apply the mask to select the valid values where there are no clouds.

On the other hand, you can first use the cloud mask to set the data values to NaN where there are clouds, then you can just filter these values out after calculating the NDVI values.

There should be no difference between the two approaches.

rim.sleimi · June 24, 2020, 10:33am

Thanks for the quick reply.
I mentioned this here because I saw that both of you commented on this:

RuntimeWarning: invalid value encountered in true_divide

Therefore, is it possible to like first apply the cloud mask, then interpolate missing values and then calculate NDVI?

matic.lubej · June 24, 2020, 10:46am

If you first apply the cloud mask, you will likely put in 0 or NaN, which will result in the same warning.

Even if you interpolate, there can always be some NANs at the beginning/end of the time series.

As mentioned above, this is just a warning, everything works as it should. If you are annoyed by the warnings though, it is possible to turn them off in the environment that you’re working in. Otherwise you can safely ignore it because it’s just for notification purposes.

rim.sleimi · June 24, 2020, 10:54am

The warning doesn’t concern me. Only the interpretation of such values, from a geoscientist point of view, as I want to understand as much as possible how each element works and how does that translate in the remote sensing world. Because I feel like when dealing with mere numbers is easier than when adding the context of the data.

there can always be some NANs at the beginning/end of the time series.

Could you please elaborate on that? because when I displayed the NDVI values in a dataframe I noticed that NaNs are in the beginning and at the end of the data frame. Is it the same thing that you mentioned?

bgumwelt · June 24, 2020, 10:58am

HI Matic,

with the changes I get a an error:

My Settings of sentinhub-package:

Processing Entities of Sentinlehub-Dashboard:

matic.lubej · June 24, 2020, 2:33pm

Perhaps it’s a problem with windows and parallelization.

a) try restarting the notebook server
b) try putting the multiprocessing back to False
c) try just 1 worker in the executor

A few things to try, c) should be slower, but should work. If it doesn’t, somethings else might be wrong.

Let me know!

matic.lubej · June 24, 2020, 2:37pm

Ah OK, I understand. Of course, it’s better to understand the context.

Could you please elaborate on that? because when I displayed the NDVI values in a dataframe I noticed that NaNs are in the beginning and at the end of the data frame. Is it the same thing that you mentioned?

Yes, this is most likely it. When you perform the interpolation, the values are inferred from the values which are available before and after a specific point. If the values at the beginning or at the end are NaNs, then the values there cannot be inferred, since this would then be extrapolation, not interpolation. Since this is not done, the values are NaN until the first and after the last valid observation of each pixel.

Cheers,
Matic

bgumwelt · June 24, 2020, 7:31pm

Hello Matic,

thank you very much, it is working with:
grafik

BUT now the next error:

Do I need a special version of scikit-learn-package?
Maybe something change? Or a Windows-Problem?

Good night,
Kevin

matic.lubej · June 24, 2020, 8:23pm

Hi Kevin,

thanks for pointing this out. It seems that you are using a more up to date version of scikit-learn, which is in fact preferred. I will update the code in our example, but in order to make it work in your code, just replace from sklearn.externals import joblib with import joblib. If you are getting errors regarding joblib, you need to install it via pip install joblib. Then it should work fine.

Hopefully we will get to the bottom of all these errors. Looks like a perfect storm.

Cheers,
Matic