Find edges (border of rectangle) inside image

I have an image of a sticky note in the background (say a wall or laptop) and I want to detect the edges of the sticky note (rough detection also works great) so that I can trigger a crop on it.

I'm planning on using ImageMagick to actually crop, but I'm stuck with edge detection.

Ideally, my output should give me 4 coordinates for 4 border points, so I can run a crop on it.

How can I proceed with this?

stickynote

+3


source to share


2 answers


You can do it with ImageMagick.

There are different methods of instant messaging. Here is the first algorithm that came to my mind. He assumes that sticky notes are not tilted or rotated in a large image:

  • Step one: Use Tiny Edge Detection to reveal the edges of the note.
  • Second step: determine the coordinates of the edges.

Canny Edge Detection

This command will create a black and white image depicting all edges in the original image:

convert                              \
  http://i.stack.imgur.com/SxrwG.png \
 -canny 0x1+10%+30%                  \
  canny-edges.png

      

canny-edges.png

Determine the coordinates of the edges

Assuming the image size is XxY

pixels. You can then resize the image in column 1xY

and row of Xx1

pixels, where each pixel color value is the average of the corresponding pixels of all pixels that were in the same row or the same column as the corresponding column / pixel of the row.

As an example, which can be seen below, I will first resize the new canny-edge.pngs to 4xY

and Xx4

images:

identify -format  " %W x %H\n"  canny-edges.png
 400x300

convert canny-edges.png -resize 400x4\!   canny-4cols.png
convert canny-edges.png -resize   4x300\! canny-4rows.png

      

canny-4cols.png

canny-4cols.png

canny-4rows.png

canny-4rows.png

Now that the previous images render what would achieve resizing the image into multiple columns or rows of pixels, do it with one column and one row. At the same time, we'll change the output format to text rather than PNG to get the coordinates of those white pixels:

convert canny-edges.png -resize 400x1\!   canny-1col.txt
convert canny-edges.png -resize   1x300\! canny-1row.txt

      

Here's part of the output from canny-1col.txt

:

# ImageMagick pixel enumeration: 400,1,255,gray
0,0: (0,0,0)  #000000  gray(0)
1,0: (0,0,0)  #000000  gray(0)
2,0: (0,0,0)  #000000  gray(0)
[....]
73,0: (0,0,0)  #000000  gray(0)
74,0: (0,0,0)  #000000  gray(0)
75,0: (10,10,10)  #0A0A0A  gray(10)
76,0: (159,159,159)  #9F9F9F  gray(159)
77,0: (21,21,21)  #151515  gray(21)
78,0: (156,156,156)  #9C9C9C  gray(156)
79,0: (14,14,14)  #0E0E0E  gray(14)
80,0: (3,3,3)  #030303  gray(3)
81,0: (3,3,3)  #030303  gray(3)
[....]
162,0: (3,3,3)  #030303  gray(3)
163,0: (4,4,4)  #040404  gray(4)
164,0: (10,10,10)  #0A0A0A  gray(10)
165,0: (7,7,7)  #070707  gray(7)
166,0: (8,8,8)  #080808  gray(8)
167,0: (8,8,8)  #080808  gray(8)
168,0: (8,8,8)  #080808  gray(8)
169,0: (9,9,9)  #090909  gray(9)
170,0: (7,7,7)  #070707  gray(7)
171,0: (10,10,10)  #0A0A0A  gray(10)
172,0: (5,5,5)  #050505  gray(5)
173,0: (13,13,13)  #0D0D0D  gray(13)
174,0: (6,6,6)  #060606  gray(6)
175,0: (10,10,10)  #0A0A0A  gray(10)
176,0: (10,10,10)  #0A0A0A  gray(10)
177,0: (7,7,7)  #070707  gray(7)
178,0: (8,8,8)  #080808  gray(8)
[....]
319,0: (3,3,3)  #030303  gray(3)
320,0: (3,3,3)  #030303  gray(3)
321,0: (14,14,14)  #0E0E0E  gray(14)
322,0: (156,156,156)  #9C9C9C  gray(156)
323,0: (21,21,21)  #151515  gray(21)
324,0: (159,159,159)  #9F9F9F  gray(159)
325,0: (10,10,10)  #0A0A0A  gray(10)
326,0: (0,0,0)  #000000  gray(0)
327,0: (0,0,0)  #000000  gray(0)
[....]
397,0: (0,0,0)  #000000  gray(0)
398,0: (0,0,0)  #000000  gray(0)
399,0: (0,0,0)  #000000  gray(0)

      

As you can see, the detected edges from the text also affected the grayscale values ​​in pixels. Therefore, we could enter an additional command -threshold 50%

into our commands to get clean black and white output:

convert canny-edges.png -resize 400x1\!   -threshold 50% canny-1col.txt
convert canny-edges.png -resize   1x300\! -threshold 50% canny-1row.txt

      

I will not give the contents of the new text files here, you can try it and see for yourself if you are interested. Instead, I'll make a shortcut: I'll post a textual representation of the pixel's color values ​​in <stdout>

and directly grep it for all non-black pixels:

convert canny-edges.png -resize 400x1\!   -threshold 50% txt:- \
| grep -v black

  # ImageMagick pixel enumeration: 400,1,255,srgb
  76,0: (255,255,255)  #FFFFFF  white
  78,0: (255,255,255)  #FFFFFF  white
  322,0: (255,255,255)  #FFFFFF  white
  324,0: (255,255,255)  #FFFFFF  white

convert canny-edges.png -resize   1x300\! -threshold 50% txt:- \
| grep -v black

  # ImageMagick pixel enumeration: 1,300,255,srgb
  0,39: (255,255,255)  #FFFFFF  white
  0,41: (255,255,255)  #FFFFFF  white
  0,229: (255,255,255)  #FFFFFF  white
  0,231: (255,255,255)  #FFFFFF  white

      

From the above results, it can be inferred that the four pixel coordinates are a note labeled inside another image:



  • bottom left corner: (323|40)

  • top right corner: (77|230)

The area is 246 pixels wide and 190 pixels high.

(ImageMagick takes the system origin in the upper left corner of the image.)

Now cut a note from the original image, which you can do:

convert http://i.stack.imgur.com/SxrwG.png[246x190+77+40] sticky-note.png

      

sticky-note.png

Additional search parameters

autotrace

You can optimize the above procedure (even convert it to an auto-running script if you like) even more by converting the intermediate "canny-edge.png" to SVG vector graphics, for example by running it autotrace

...

This can be useful if your sticky note is tilted or rotated.

Hough line detection

Once you have the "canny" lines, you can also apply the Hough Line Detection algorithm:

convert              \
  canny-edges.png    \
 -background black   \
 -stroke red         \
 -hough-lines 5x5+20 \
  lines.png

      

lines.png

Note that the operator -hough-lines

expands and prints the detected lines from one edge (floating point) to the other edge of the original image.

While the previous command finally converts the strings to PNG, the operator -hough-lines

actually generates an MVG (Magick Vector Graphics) file internally. This means that you can actually read the source code of the MVG file and determine the mathematical parameters of each line shown in the image of the "red lines":

convert              \
  canny-edges.png    \
 -hough-lines 5x5+20 \
  lines.mvg

      

This is a more complex process and also works for edges that are not strictly horizontal and / or vertical.

But your image image uses horizontal and vertical edges, so you can even use simple shell commands to detect them.

In total, the generated MVG file contains 80 line descriptions. You can identify all horizontal lines in this file:

cat lines.mvg                              \
 | while read a b c d e ; do               \
     if [ x${b/0,/} == x${c/400,/} ]; then \
       echo "$a    $b    $c   $d    $e" ;  \
     fi;                                   \
   done

    line     0,39.5    400,39.5    # 249
    line     0,62.5    400,62.5    # 48
    line     0,71.5    400,71.5    # 52
    line     0,231.5   400,231.5   # 249

      

Now select all the vertical lines :

cat lines.mvg                              \
 | while read a b c d e; do                \
     if [ x${b/,0/} == x${c/,300} ]; then  \
        echo "$a    $b    $c   $d    $e" ; \
     fi;                                   \
   done

   line     76.5,0   76.5,300     # 193
   line    324.5,0  324.5,300     # 193

      

+6


source


Last week I faced a similar problem of detecting image boundaries (gaps) and spent many hours trying different approaches and tools, and after that I finally solved it with the entropy difference approach, so JFYI is the algorithm here.

Let's say you want to determine if an image has a 200x100px top border:

  • Get top of image 25% height (25px) (0: 25, 0: 200)
  • Get the bottom piece with the same height starting at the top end of the piece and deeper to the center of the image (25: 50, 0: 200).

shows the upper and lower parts

  1. Calculate entropy for both parts
  2. Find the difference in entropy and store it at the current block height.
  3. Make the top 1px smaller (24px) and repeat from p.2 until we hit the edge of the image (height 0) - resizing the scan area each iteration, thus sliding towards the edge of the image.
  4. Find the maximum stored differences in entropy and its block height - this is the center of our border if it lies closer to the edge, not to the center of the image, and the maximum difference in entropy is above a given threshold (0.5, for example)


And apply this algorithm to all sides of your image.

Here is a code snippet to determine if the image has an upper bound and find the approximate coordinate (top offset), pass grayscale ("L" mode) Pillow image for the scan function :

import numpy as np


MEDIAN = 0.5


def scan(im):
    w, h = im.size
    array = np.array(im)

    center_ = None
    diff_ = None
    for center in reversed(range(1, h // 4 + 1)):
        upper = entropy(array[0: center, 0: w].flatten())
        lower = entropy(array[center: 2 * center, 0: w].flatten())
        diff = upper / lower if lower != 0.0 else MEDIAN
        if center_ is None or diff_ is None:
            center_ = center
            diff_ = diff
        if diff < diff_:
            center_ = center
            diff_ = diff
    top = diff_ < MEDIAN and center_ < h // 4, center_, diff_

      

Full source code with examples of rendered borders and transparent (borderless) images is given below: https://github.com/embali/enimda/

-1


source







All Articles