Question about NO DATA labels in eopatches

TLDR: What is causing the pixels in the examples to be labeled as NO_DATA, and how are these pixels used? If they are not used, how did you exclude them from the data without doing some sort of resizing?

Hello, I have another question that has been troubling me. Specifically in the SI_LULUC_pipeline and also in the eoflow example one of the labels is NO-DATA. I am wondering where this comes from. What is causing these pixels to be labeled as no data? Are they being used for prediction or are they being simply ignored in some way? How do the NO-Data pixels differ from NaNs in these examples? Sorry have just been confused about this.

Hi @ncouch

The NO_DATA label is assigned to pixels for which we don’t have any reference data. This could happen if your reference data is sparse (doesn’t cover whole AOI), or if you are eroding your pixels to have more clean data (e.g. removing pixels from border of the forest to reduce mixing of forest class with something else).

Later before the training part, pixels with such labels are removed, as we do not want the model to learn to classify pixels to NO_DATA class. See for instance line 23 in eo-learn LULC example model construction and training:

# Remove points with no reference from training (so we dont train to recognize "no data")
mask_train = labels_train == 0
features_train = features_train[~mask_train]
labels_train = labels_train[~mask_train]

# Remove points with no reference from test (so we dont validate on "no data", which doesn't make sense)
mask_test = labels_test == 0
features_test = features_test[~mask_test]
labels_test = labels_test[~mask_test]

As you can see in the confusion matrix plots, the NO_DATA class does not exist.

Hope this answers your questions. All the best!

Hello! Thank you for answering my question this does help me. I do have two further one based on this answer.

  1. So we are removing the pixels that are labeled as no data, how is this done without changing the patch size? This may seem trivial, I am just confused on this.
  2. Would this approach of removing these pixels labeled as NO-DATA work for all eo examples? I am working on two, one being the LULC example with LightGBM and the other being the EO_Flow example using TFCN.

Hi @ncouch,

I’ll try to answer, although I might be interpreting your question wrongly…

Think of the NO_DATA pixels as pixels without any label/class attached to them. So the pixels are still there (they have satellite data, masks etc.), but they are not assigned any class/label, so they are (in pixel-based approaches) not used for training (as shown in the code above) and thus never predicted.

In case of convolutional networks using spatial convolutions (e.g. predicting on patchlets/images instead of pixels), one would typically either train on patchlets that have no NO_DATA pixels, or train with NO_DATA class as a “background” class and then, during the prediction, not use this class (e.g. only use the rest of the classes as input to max pooling.)

Thanks, @batic

This was very helpful thank you for the continued responses. I have been moving forward on my testing but I keep returning to a question that I cant answer. I may just be the one confused here but I am going to try and reword my question. Although I think I may be close to the solution based on your responses.

– So, when using a convolutional neural network (CNN) it is my understanding that its not possible to discard/ignore some pixels since those would leave behind holes. This may be dealt with by eo-learn I am not sure. If we include NO_DATA labeled pixels then that will potentially create training problems in the CNN because it would learn to associate pixel values to a label which is not actually a proper label. I see you say that these pixels are disregarded and not used for training/testing, but I dont understand if this creates some sort of issue. In my code we have a lot of pixels with no-data labels and we are basically trying to come up with the best way to deal with them. This is where we got confused about NaN values vs No-data labels.

I hope I worded this in a way others can understand, I am still confused myself but I am getting there!