CLM-error in eoLearn-slovenia Land Cover Classification script

bgumwelt · June 23, 2020, 6:49pm

Hello Matic,

thank you very much for your helpfull replies.

I have a new error at downloading the patches. Some patches have no problems and some got error.

sentinelhub.exceptions.DownloadFailedException: During execution of task SentinelHubInputTask: Failed to download from:
https://services.sentinel-hub.com/api/v1/process
with ConnectionError:
HTTPSConnectionPool(host=‘services.sentinel-hub.com’, port=443): Max retries exceeded with url: /api/v1/process (Caused by NewConnectionError(‘<urllib3.connection.HTTPSConnection object at 0x000001775AF334C8>: Failed to establish a new connection: [WinError 10060] Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat’))
Please check your internet connection and try again.

eoexecution-report-2020_06_22-23_00_01/report.html:

matic.lubej · June 24, 2020, 7:33am

Hi Kevin,

I think that my choice of some of the parameters in the example notebook was a bit too extreme, as it tries to download the data with too many instances in parallel. I already fixed this in the develop version on the repository, but you can just change the parameters yourself.

The data which you downloaded should be fine and in principle you could rerun the download and after some iterations you would have the whole dataset. Unfortunately the choice of these parameters is not optimal for all machines.

In the SentinelHubInputTask try setting the max_threads parameter to 5 or less. i.e.:

add_data = SentinelHubInputTask(
    bands_feature=(FeatureType.DATA, 'BANDS'),
    bands = band_names,
    resolution=10,
    maxcc=0.8,
    time_difference=datetime.timedelta(minutes=120),
    data_source=DataSource.SENTINEL2_L1C,
    additional_data=[(FeatureType.MASK, 'dataMask', 'IS_DATA'),
                     (FeatureType.MASK, 'CLM'),
                     (FeatureType.DATA, 'CLP')],
    max_threads=5
)

And later when you run the workflow with the EOExecutor, again change the number of workers to 5 or less and also set the multiprocess parameter to True (otherwise it uses multithreading), i.e.:

executor = EOExecutor(workflow, execution_args, save_logs=True)
executor.run(workers=5, multiprocess=True)

Hopefully these settings will be friendlier to your machine.

Cheers,
Matic

rim.sleimi · June 24, 2020, 9:39am

Then wouldn’t make more sense to delete those NAN/ 0 values before proceding with any type of processing?
Also I noticed in the Notebooks that NDVI, NDWI… are calculated before cloud masking is applied. Although when visualizing the some of the Sentinel 2A images (true color) I noticed the presence of clouds (after masking) which basically means that NDVI, and NDWI were calculated based on the cloudy images. In such case reflectance values of vegetation are not captured in cloudy areas and thus NDVI values are not the real ones, right?

matic.lubej · June 24, 2020, 10:15am

Hi @rim.sleimi,

I don’t think this has to do anything with this issue. @bgumwelt had some issues downloading the data, not with data being NAN/0. Or am I missing something here?

Also I noticed in the Notebooks that NDVI, NDWI… are calculated before cloud masking is applied. Although when visualizing the some of the Sentinel 2A images (true color) I noticed the presence of clouds (after masking) which basically means that NDVI, and NDWI were calculated based on the cloudy images. In such case reflectance values of vegetation are not captured in cloudy areas and thus NDVI values are not the real ones, right?

This all just depends on your workflow. You can calculate NDVI on the whole image, clouds included, but the NDVI values there will not be valid. This is why you can then apply the mask to select the valid values where there are no clouds.

On the other hand, you can first use the cloud mask to set the data values to NaN where there are clouds, then you can just filter these values out after calculating the NDVI values.

There should be no difference between the two approaches.

rim.sleimi · June 24, 2020, 10:33am

Thanks for the quick reply.
I mentioned this here because I saw that both of you commented on this:

RuntimeWarning: invalid value encountered in true_divide

Therefore, is it possible to like first apply the cloud mask, then interpolate missing values and then calculate NDVI?

matic.lubej · June 24, 2020, 10:46am

If you first apply the cloud mask, you will likely put in 0 or NaN, which will result in the same warning.

Even if you interpolate, there can always be some NANs at the beginning/end of the time series.

As mentioned above, this is just a warning, everything works as it should. If you are annoyed by the warnings though, it is possible to turn them off in the environment that you’re working in. Otherwise you can safely ignore it because it’s just for notification purposes.

rim.sleimi · June 24, 2020, 10:54am

The warning doesn’t concern me. Only the interpretation of such values, from a geoscientist point of view, as I want to understand as much as possible how each element works and how does that translate in the remote sensing world. Because I feel like when dealing with mere numbers is easier than when adding the context of the data.

there can always be some NANs at the beginning/end of the time series.

Could you please elaborate on that? because when I displayed the NDVI values in a dataframe I noticed that NaNs are in the beginning and at the end of the data frame. Is it the same thing that you mentioned?

bgumwelt · June 24, 2020, 10:58am

HI Matic,

with the changes I get a an error:

My Settings of sentinhub-package:

Processing Entities of Sentinlehub-Dashboard:

matic.lubej · June 24, 2020, 2:33pm

Perhaps it’s a problem with windows and parallelization.

a) try restarting the notebook server
b) try putting the multiprocessing back to False
c) try just 1 worker in the executor

A few things to try, c) should be slower, but should work. If it doesn’t, somethings else might be wrong.

Let me know!

matic.lubej · June 24, 2020, 2:37pm

Ah OK, I understand. Of course, it’s better to understand the context.

Could you please elaborate on that? because when I displayed the NDVI values in a dataframe I noticed that NaNs are in the beginning and at the end of the data frame. Is it the same thing that you mentioned?

Yes, this is most likely it. When you perform the interpolation, the values are inferred from the values which are available before and after a specific point. If the values at the beginning or at the end are NaNs, then the values there cannot be inferred, since this would then be extrapolation, not interpolation. Since this is not done, the values are NaN until the first and after the last valid observation of each pixel.

Cheers,
Matic

bgumwelt · June 24, 2020, 7:31pm

Hello Matic,

thank you very much, it is working with:
grafik

BUT now the next error:

Do I need a special version of scikit-learn-package?
Maybe something change? Or a Windows-Problem?

Good night,
Kevin

matic.lubej · June 24, 2020, 8:23pm

Hi Kevin,

thanks for pointing this out. It seems that you are using a more up to date version of scikit-learn, which is in fact preferred. I will update the code in our example, but in order to make it work in your code, just replace from sklearn.externals import joblib with import joblib. If you are getting errors regarding joblib, you need to install it via pip install joblib. Then it should work fine.

Hopefully we will get to the bottom of all these errors. Looks like a perfect storm.

Cheers,
Matic

rim.sleimi · June 24, 2020, 8:53pm

I don’t think I get what you are trying to say .
So following your explanations on GitHub and here, I tried to further investigate the source of the NaN values. So I did the following:

# Checking if there is NaN values in Red, NIR, and NDVI multidimentional arrays

Red=eo.data['BANDS'][0][..., [3]]

NIR=eo.data['BANDS'][0][..., [7]]

ndvi=eo.data['NDVI'][0]

Denom=NIR+Red

np.isnan(Red).any(), np.isnan(NIR).any(), np.isnan(Denom).any(), np.isnan(ndvi).any()       #(False, False, False, False)

# Checking if there are zeros in Red, NIR, and NDVI multidimentional arrays

0 in Red, 0 in NIR, 0 in Denom, 0 in ndvi           #(False, False, False, True)

eo: the patch
Denom: denominator
if denom is different than zero and different than NaN than why the error says
RuntimeWarning: invalid value encountered in true_divide
I really can’t wrap my head around this.

matic.lubej · June 24, 2020, 9:16pm

Hi @rim.sleimi,

so far I was talking about the general cases where values can be nan or 0. In your case, since you check if they are NAN or 0 in the beginning, I agree that it sounds fishy that this error would occur. Unfortunately, I cannot say anything at this point.

Would it be possible that you prepare a minimal working example where this error is produced, along with providing the data where this happens?

rim.sleimi · June 24, 2020, 9:44pm

Sure. Would the Notebook that I am currently working on be fine ?

matic.lubej · June 24, 2020, 9:51pm

Of course. Could I please ask you to create a new ticket for this? Either here or eo-learn Github is fine.

Thanks!

rim.sleimi · June 24, 2020, 10:02pm

New ticket=New issue?
How will I share the data (i.e, country boundary map/ already downloaded eopatches)?

matic.lubej · July 7, 2020, 7:56am

Sorry, I missed this. Did you already create a new issue?

rim.sleimi · July 8, 2020, 7:05am

No I haven’t. How can I do that?

matic.lubej · July 8, 2020, 9:42am

Hi @rim.sleimi

you can also open a new github issue on eolearn, where you provide a minimal working example of the issues that you are having. Just go here an click on “new issue”.