How can I convert an RGB image to a color-based one-line 3D array using numpy?
Simply put, what I am trying to do is similar to this question: Convert an RGB image to an index image , but instead of a single channel index image, I want to get an n channel image where img[h, w]
is one hot coded vector. For example, if the input image [[[0, 0, 0], [255, 255, 255]]
and index 0 is assigned to black and 1 is assigned to white, then the desired output is [[[1, 0], [0, 1]]]
.
Like the previous person asked the question, I implemented it naively, but the code is quite slow and I believe that the correct solution using numpy will be significantly faster.
Also, as suggested in the previous post, I can preprocess each grayscale image and encode the image once, but I want a more general solution.
Example
Let's say I want to assign white to 0, red to 1, blue to 2, and yellow to 3:
(255, 255, 255): 0 (255, 0, 0): 1 (0, 0, 255): 2 (255, 255, 0): 3
and I have an image consisting of four colors where the image is a 3D array containing the R, G, B values ββfor each pixel:
[
[[255, 255, 255], [255, 255, 255], [255, 0, 0], [255, 0, 0]],
[[ 0, 0, 255], [255, 255, 255], [255, 0, 0], [255, 0, 0]],
[[ 0, 0, 255], [ 0, 0, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 0], [255, 255, 0]]
]
and this is what I want to get where every pixel changes by one hot encoded index values. (Since changing a 2d array of index values ββto a 3d array with one hot coded value is easy, getting a 2d array of index values ββis fine too.)
[
[[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]],
[[0, 0, 1, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0]],
[[0, 0, 1, 0], [0, 0, 1, 0], [1, 0, 0, 0], [1, 0, 0, 0]],
[[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]]
]
In this example, I've used colors where the RGB components are 255 or 0, but I don't want decisions to rely on that fact.
source to share
We could generate decimal equivalents for each pixel color. There would be full possibilities with each channel having 0
or 255
as a value, 8
but it seems we are only interested in four of these colors.
Then we would have two ways to solve it:
-
One could use unique indices from these decimal equivalents, starting from
0
to the final color, all in sequence, and finally initializing the output array and assigning to it. -
Another way would be to use
broadcasted
comparison of these decimal equivalents with colors.
Following are the two methods -
def indexing_based(a):
b = (a == 255).dot([4,2,1]) # Decimal equivalents
colors = np.array([7,4,1,6]) # Define colors decimal equivalents here
idx = np.empty(colors.max()+1,dtype=int)
idx[colors] = np.arange(len(colors))
m,n,r = a.shape
out = np.zeros((m,n,len(colors)), dtype=int)
out[np.arange(m)[:,None], np.arange(n), idx[b]] = 1
return out
def broadcasting_based(a):
b = (a == 255).dot([4,2,1]) # Decimal equivalents
colors = np.array([7,4,1,6]) # Define colors decimal equivalents here
return (b[...,None] == colors).astype(int)
Example run -
>>> a = np.array([
... [[255, 255, 255], [255, 255, 255], [255, 0, 0], [255, 0, 0]],
... [[ 0, 0, 255], [255, 255, 255], [255, 0, 0], [255, 0, 0]],
... [[ 0, 0, 255], [ 0, 0, 255], [255, 255, 255], [255, 255, 255]],
... [[255, 255, 255], [255, 255, 255], [255, 255, 0], [255, 255, 0]],
... [[255, 255, 255], [255, 0, 0], [255, 255, 0], [255, 0 , 0]]])
>>> indexing_based(a)
array([[[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]],
[[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]],
[[0, 0, 1, 0],
[0, 0, 1, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]],
[[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 1]],
[[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0]]])
>>> np.allclose(broadcasting_based(a), indexing_based(a))
True
source to share
My solution looks like this and should work for any colors:
color_dict = {0: (0, 255, 255),
1: (255, 255, 0),
....}
def rgb_to_onehot(rgb_arr, color_dict):
num_classes = len(color_dict)
shape = rgb_arr.shape[:2]+(num_classes,)
arr = np.zeros( shape, dtype=np.int8 )
for i, cls in enumerate(color_dict):
arr[:,:,i] = np.all(rgb_arr.reshape( (-1,3) ) == color_dict[i], axis=1).reshape(shape[:2])
return arr
def onehot_to_rgb(onehot, color_dict):
single_layer = np.argmax(onehot, axis=-1)
output = np.zeros( onehot.shape[:2]+(3,) )
for k in color_dict.keys():
output[single_layer==k] = color_dict[k]
return np.uint8(output)
I haven't tested it for speed yet, but at least it works :)
source to share