Finding RGB bounding boxes in an image

I am working with a page segmentation algorithm. The output of the code records the image, with the pixels of each zone being assigned a unique color. I would like to process the image to find the bounding boxes of the zones. I need to find all colors, then find all pixels of that color, and then find their bounding box.

Below is a sample image.

Example output image showing colored zones

Currently I start with the histograms of the R, G, B channels. The histograms tell me where the data is.

img = Image.open(imgfilename)
img.load()
r,g,b = img.split()

ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]

rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)

      

Output: (array ([0, 1, 128, 205, 255]),) (array ([0, 20, 128, 186, 255]),) (array ([0, 128, 147, 150, 255] ),)

At that moment, I burst out laughing a little. On visual inspection, I have the colors (0,0,0), (1,0,0), (0,20,0), (128,128,128), etc. How to wrap non-zero outputs to pixel values โ€‹โ€‹for np.where ()?

I am considering antialiasing 3, row, columns to 2D plane of 24-bit packed RGB values โ€‹โ€‹(r <24 | g <16 | b) and searching for that array. It seems like brute force and inelegant. Is there a better way in Numpy to find the border margins of a color value?

+3


source to share


3 answers


There is no reason to think of this as an RGB color image, it is just a segmentation rendering that someone else did. You can easily think of it as a grayscale image and you don't need to do anything else for those specific colors.

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img) 
colors = set(numpy.unique(im))
colors.remove(255)

for color in colors:
    py, px = numpy.where(im == color)
    print(px.min(), py.min(), px.max(), py.max())

      

If you cannot rely on convert('L')

providing unique colors (i.e. you are using other colors outside of the ones shown in this image), you can package the image and get unique colors:



...
im = numpy.array(img, dtype=int)

packed = im[:,:,0]<<16 | im[:,:,1]<<8 | im[:,:,2]
colors = set(numpy.unique(packed.ravel()))
colors.remove(255<<16 | 255<<8 | 255)

for color in colors:
    py, px = numpy.where(packed == color)
    print(px.min(), py.min(), px.max(), py.max())

      

I would also recommend removing the small connected components before finding the bounding boxes.

+4


source


EDIT Putting everything together into a working program using the image you posted:

from __future__ import division
import numpy as np
import itertools
from PIL import Image

img = np.array(Image.open('test_img.png'))

def bounding_boxes(img) :
    r, g, b = [np.unique(img[..., j]) for j in (0, 1, 2)]
    bounding_boxes = {}
    for r0, g0, b0 in itertools.product(r, g, b) :
        rows, cols = np.where((img[..., 0] == r0) &
                              (img[..., 1] == g0) &
                              (img[..., 2] == b0))
        if len(rows) :
            bounding_boxes[(r0, g0, b0)] = (np.min(rows), np.max(rows),
                                            np.min(cols), np.max(cols))
    return bounding_boxes

In [2]: %timeit bounding_boxes(img)
1 loops, best of 3: 30.3 s per loop

In [3]: bounding_boxes(img)
Out[3]: 
{(0, 0, 255): (3011, 3176, 755, 2546),
 (0, 128, 0): (10, 2612, 0, 561),
 (0, 128, 128): (1929, 1972, 985, 1438),
 (0, 255, 0): (10, 166, 562, 868),
 (0, 255, 255): (2938, 2938, 680, 682),
 (1, 0, 0): (10, 357, 987, 2591),
 (128, 0, 128): (417, 1873, 984, 2496),
 (205, 186, 150): (11, 56, 869, 1752),
 (255, 0, 0): (3214, 3223, 570, 583),
 (255, 20, 147): (2020, 2615, 956, 2371),
 (255, 255, 0): (3007, 3013, 600, 752),
 (255, 255, 255): (0, 3299, 0, 2591)}

      

Not very fast, even though only a small number of colors are actually tested ...


You can find the bounding box for flowers r0

, g0

, b0

with something along the lines of



rows, cols = np.where((ra == r0) & (ga == g0) & (ba == b0))
top, bottom = np.min(rows), np.max(rows)
left, right = np.min(cols), np.max(cols)

      

Instead of repeating all 2**24

RGB color combinations , you can drastically reduce your search space by using only the Cartesian product of non-zero histograms:

for r0, g0, b0 in itertools.product(np.nonzero(rhist),
                                    np.nonzero(ghist),
                                    np.nonzero(bhist)) :

      

Do you have any non-existent combinations that you can filter by checking that rows

and cols

are not empty tuples. But in your example, you would reduce the search space for combinations 2**24

to 125.

+2


source


It's just a solution off my head. You can sort through the pixels in the image, starting from the upper left and lower right and retain the values of top

, bottom

, left

and right

for each color. For a given color, the value top

will be the first row you see with that color, and it bottom

will be the last raw value, the value left

will be the minimum column value for pixels in that color, and the right

maximum column value you find.

Then for each color, you can draw a rectangle from top-left

to bottom-right

in the color you want .

I don't know if it qualifies as a good bounding box algorithm, but I think it's okay.

0


source







All Articles