Additional Question on No-Data Labels

ncouch · October 21, 2021, 8:47pm

I have been moving forward on my testing but I keep returning to a question that I cant answer. I may just be the one confused here but I am going to try and reword my question. Although I think I may be close to the solution based previous responses.

– So, when using a convolutional neural network (CNN) it is my understanding that its not possible to discard/ignore some pixels since those would leave behind holes. This may be dealt with by eo-learn I am not sure. If we include NO_DATA labeled pixels then that will potentially create training problems in the CNN because it would learn to associate pixel values to a label which is not actually a proper label. I see you say that these pixels are disregarded and not used for training/testing, but I dont understand if this creates some sort of issue. In my code we have a lot of pixels with no-data labels and we are basically trying to come up with the best way to deal with them. This is where we got confused about NaN values vs No-data labels.

I hope I worded this in a way others can understand, I am still confused myself but I am getting there!

batic · October 22, 2021, 5:53am

What you are describing seems normal to me.

Having a lot of pixels without labels can happen when you have:

lack of labels
you are trying to do binary classification (just two classes, where no label actually means “this pixel does not belong”)
you are doing object detection/segmentation (and majority of pixels do not belong to any of the classes - number of pixels in an image that represent a plane is small)
possibly several other reasons

Having a lot of empty (No data/no labels) pixels might be problematic, particularly in cases when you are trying to do a full classification (every pixel in the results should be classified to one of your known classes), but in principle only from the point of small training dataset. What I would suggest is to create training imagelets (chips, patchlets, …) only where the amount of missing labels is below some threshold (e.g. 20%).

What should happen in such case is that the model will (hopefully) learn to assign a certain spectral response (and spatial context) to a given class, regardless that in some imagelets such training data is not available.

On the other case, in object segmentation the model should pick up the not labeled pixels as “not belonging to any object”, which is precisely what it should learn. Sometimes it even makes sense to add such “negative samples” to training data.

I hope this will help you. In any case, I suggest you try and see what happens.

ncouch · November 16, 2021, 3:46pm

I have an additional question. I believe that I may have figured out what the cause of the massive amount of no_data labels was. An additional 25 eopatches were added to our data but from the pipeline example there is only referenced labeled data from the original 25, at least that is my understanding. So every pixel from the added 25 patches is counted as no_data. My question is, is there a way to obtain this labeled data for eopatches? Or have I misunderstood the way it works?