How to load image masks (shortcuts) for image segmentation in Keras
I am using Tensorflow as a backend for Keras and I am trying to figure out how to use my labels to train image segmentation.
I am using LFW Parts Dataset which has both a basic truth image and an earth truth mask that looks like this: 1500 instructional images
As I understand it, during training, I load as
- (X) Image
- (Y) Mask image
Do this in batches to suit my needs. Now my question is, is it enough to just load both of them (Image and Mask Image) as NumPy arrays (N, N, 3) or do I need to process / modify the mask image somehow. Effectively the mask / marks are represented as [R, G, B] pixels, where:
- [255, 0, 0] Hair
- [0, 255, 0] Face
- [0, 0, 255] Background
I could do something like this to normalize it to 0-1, I don't know if I should:
im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)
so I get:
- [1, 0, 0] Hair
- [0, 1, 0] Face
- [0, 0, 1] Background
Everything I've found on the internet uses existing datasets in tensorflow or keras. It's really not clear how to distract yourself once you have what could be considered a custom dataset.
I found this related to Caffe: https://groups.google.com/forum/#!topic/caffe-users/9qNggEa8EaQ
And they are in favor of converting mask images to (H, W, 1)
(HWC) where my classes will be 0, 1 ,2
for background, hair and face respectively.
Maybe it's a duplicate here (a combination of similar quesiton / answers):
How to implement multiclassical semantic segmentation?
Tensorflow: How to Create a Pascal VOC Style Image
I found one example that handles PascalVOC in (N, N, 1), which I adapted:
LFW_PARTS_PALETTE = {
(0, 0, 255) : 0 , # background (blue)
(255, 0, 0) : 1 , # hair (red)
(0, 0, 255) : 2 , # face (green)
}
def convert_from_color_segmentation(arr_3d):
arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
palette = LFW_PARTS_PALETTE
for i in range(0, arr_3d.shape[0]):
for j in range(0, arr_3d.shape[1]):
key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0
return arr_2d
I think it might be close to what I want, but out of place. I guess I need it to be (N, N, 3) since I have 3 classes? The above version, and there is one more of these two locations:
https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50
https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (this link contains hot values)
source to share
Since this is semantic segmentation, you are classifying every pixel in the image, so you will most likely be using cross entropy loss. Keras as well as TensorFlow require your mask to be one hot coding and also the output size of your mask should be something like [batch, height, width, num_classes] <, which you will need to change in the same way as your mask before calculating the cross-entropy mask, which essentially means you will need to change your logits and mask to the tensor shape [-1, num_classes], where -1 stands for "as much as you like".
Since your question is about uploading a custom image, I just ended up creating an input pipeline for segmentation myself, but this is in TensorFlow, so I don't know if that helps you, take a look if you're interested: Tensor input pipeline for segmentation
source to share