EO-Flow example notebook assistance

ncouch · July 9, 2021, 3:48pm

Hello!

I am trying to run through the examples notebook from the EO-Flow repository and I am having an issue that will not allow me to run through the entire notebook. Down where we begin to prepare the input data we have some import statements from EO-Flow but I keep getting a module not found error. I have tried most of what I could think of and what I found online but I cannot get past this point. If anyone can provide me some assistance I would greatly appreciate it.

ncouch · July 9, 2021, 5:18pm

batic · July 11, 2021, 6:52am

Hi @ncouch ,

Could you please provide additional information? For start, the notebook where this happened and the complete error. Additional information like the OS where you are running this, etc. might be needed.

ncouch · July 14, 2021, 8:06pm

Hello!

Yes so I am working on the eo-flow repository on github from sentinelhub. I have posted the link on the second message I sent. This is the example notebook titled notebook.ipynb. I was actually able to move past the module not found error but am now running into two separate errors. The first error has to do with marshmallow and happens in the code block where we are importing from eoflow. This is the error:

/cm/shared/apps/jupyter/12.0.0/lib/python3.7/site-packages/marshmallow/fields.py:198: RemovedInMarshmallow4Warning: Passing field metadata as a keyword arg is deprecated. Use the explicit metadata=... argument instead.
RemovedInMarshmallow4Warning,

The second error I am having occurs when I actually try and train the module, a friend was able to keep rerunning the imports and it eventually worked, and it is saying that module expects 25 timestamps but got 23. This is concerning because we have checked all our patches and they each have 25 timestamps in total so we are unsure where this issue is coming from.

I know there is a lot here so let me know if I can further clarify anything that is confusing. Thank you!

ncouch · July 15, 2021, 8:04pm

I also need to pose an additional question regarding the EOPatches and how exactly the data is processed because I am getting a little confused and I feel this would assist in my understanding. So by what the notebook is showing, the timestamp for the patches is 15. Meaning there is a total of 25 timepoints per EOPatch. When looking through the pixel count however, the pixel count seems too low for there being (250,000 pixels x 25 for each timepoint) x 25 eopatches. So is there, for example, a specific date selected within the timestep where we get our 250,000 pixels? I guess what I am asking is how do the EOPatches get processed here in the examples?

devis.peressutti · July 16, 2021, 12:33pm

Hi @ncouch

From the notebook it looks like the EOPatch has 23 timestamps, as shown below in cell 4

as FEATURES is an array of size timestamps=23, height=1010, width=999, n_channels=9.

If you EOPatches have 25 timestamps, you need to change the size of read features in the build_dataset function to [25, None, None, 9]

ncouch · July 16, 2021, 3:43pm

Thank you, I completely overlooked this in the notebook. The patches I was trying to use are in fact 25 timepoints. When you were running the notebook did you run into the marshmallow error when doing the imports? This is the final hurdle I am trying to cross to run this notebook. I detailed the error in the 4th post.

devis.peressutti · July 16, 2021, 4:14pm

you mean you got the error while running this cell?

import tensorflow as tf
import json

from eoflow.models import TFCNModel
from eoflow.input.eopatch import eopatch_dataset
from eoflow.input.operations import augment_data, cache_dataset, extract_subpatches
from eoflow.utils import create_dirs

I’ve tested this on my conda environment and it works (marshmallow=3.2.2 , you can try with this version).

I’d suggest to make a clean environment and install the package anew. Let us know if it works.

ncouch · August 3, 2021, 8:13pm

So thank you I have been able to move past the marshmallow error, but I have a question about the notebook.

It seems that the NaN values are being set to -2 under the build_dataset function in the notebook example. Is there a specific reason why this is being done and also how does this -2 value affect performance of the model?

Thank you for the great examples, I feel like I am learning a lot.

devis.peressutti · August 4, 2021, 7:35am

We are glad you find the code useful.

In general you’d want to set NaN values to some distinctive value that the network can learn not to use, therefore it should be different from valid input values, e.g. if you know that reflectances range from [0, 1] then a -1 value would do (or any negative value), while if your features include some normalised index that ranges [-1, 1] then -2 could be a good candidate.

It also depends on which activation function you use. In our case it’s mostly ReLU, so choosing negative values won’t generate neuronal activation (which is what you want).