Find edges (border of rectangle) inside image
I have an image of a sticky note in the background (say a wall or laptop) and I want to detect the edges of the sticky note (rough detection also works great) so that I can trigger a crop on it.
I'm planning on using ImageMagick to actually crop, but I'm stuck with edge detection.
Ideally, my output should give me 4 coordinates for 4 border points, so I can run a crop on it.
How can I proceed with this?
source to share
You can do it with ImageMagick.
There are different methods of instant messaging. Here is the first algorithm that came to my mind. He assumes that sticky notes are not tilted or rotated in a large image:
- Step one: Use Tiny Edge Detection to reveal the edges of the note.
- Second step: determine the coordinates of the edges.
Canny Edge Detection
This command will create a black and white image depicting all edges in the original image:
convert \
http://i.stack.imgur.com/SxrwG.png \
-canny 0x1+10%+30% \
canny-edges.png
Determine the coordinates of the edges
Assuming the image size is XxY
pixels. You can then resize the image in column 1xY
and row of Xx1
pixels, where each pixel color value is the average of the corresponding pixels of all pixels that were in the same row or the same column as the corresponding column / pixel of the row.
As an example, which can be seen below, I will first resize the new canny-edge.pngs to 4xY
and Xx4
images:
identify -format " %W x %H\n" canny-edges.png
400x300
convert canny-edges.png -resize 400x4\! canny-4cols.png
convert canny-edges.png -resize 4x300\! canny-4rows.png
canny-4cols.png
canny-4rows.png
Now that the previous images render what would achieve resizing the image into multiple columns or rows of pixels, do it with one column and one row. At the same time, we'll change the output format to text rather than PNG to get the coordinates of those white pixels:
convert canny-edges.png -resize 400x1\! canny-1col.txt
convert canny-edges.png -resize 1x300\! canny-1row.txt
Here's part of the output from canny-1col.txt
:
# ImageMagick pixel enumeration: 400,1,255,gray
0,0: (0,0,0) #000000 gray(0)
1,0: (0,0,0) #000000 gray(0)
2,0: (0,0,0) #000000 gray(0)
[....]
73,0: (0,0,0) #000000 gray(0)
74,0: (0,0,0) #000000 gray(0)
75,0: (10,10,10) #0A0A0A gray(10)
76,0: (159,159,159) #9F9F9F gray(159)
77,0: (21,21,21) #151515 gray(21)
78,0: (156,156,156) #9C9C9C gray(156)
79,0: (14,14,14) #0E0E0E gray(14)
80,0: (3,3,3) #030303 gray(3)
81,0: (3,3,3) #030303 gray(3)
[....]
162,0: (3,3,3) #030303 gray(3)
163,0: (4,4,4) #040404 gray(4)
164,0: (10,10,10) #0A0A0A gray(10)
165,0: (7,7,7) #070707 gray(7)
166,0: (8,8,8) #080808 gray(8)
167,0: (8,8,8) #080808 gray(8)
168,0: (8,8,8) #080808 gray(8)
169,0: (9,9,9) #090909 gray(9)
170,0: (7,7,7) #070707 gray(7)
171,0: (10,10,10) #0A0A0A gray(10)
172,0: (5,5,5) #050505 gray(5)
173,0: (13,13,13) #0D0D0D gray(13)
174,0: (6,6,6) #060606 gray(6)
175,0: (10,10,10) #0A0A0A gray(10)
176,0: (10,10,10) #0A0A0A gray(10)
177,0: (7,7,7) #070707 gray(7)
178,0: (8,8,8) #080808 gray(8)
[....]
319,0: (3,3,3) #030303 gray(3)
320,0: (3,3,3) #030303 gray(3)
321,0: (14,14,14) #0E0E0E gray(14)
322,0: (156,156,156) #9C9C9C gray(156)
323,0: (21,21,21) #151515 gray(21)
324,0: (159,159,159) #9F9F9F gray(159)
325,0: (10,10,10) #0A0A0A gray(10)
326,0: (0,0,0) #000000 gray(0)
327,0: (0,0,0) #000000 gray(0)
[....]
397,0: (0,0,0) #000000 gray(0)
398,0: (0,0,0) #000000 gray(0)
399,0: (0,0,0) #000000 gray(0)
As you can see, the detected edges from the text also affected the grayscale values in pixels. Therefore, we could enter an additional command -threshold 50%
into our commands to get clean black and white output:
convert canny-edges.png -resize 400x1\! -threshold 50% canny-1col.txt
convert canny-edges.png -resize 1x300\! -threshold 50% canny-1row.txt
I will not give the contents of the new text files here, you can try it and see for yourself if you are interested. Instead, I'll make a shortcut: I'll post a textual representation of the pixel's color values in <stdout>
and directly grep it for all non-black pixels:
convert canny-edges.png -resize 400x1\! -threshold 50% txt:- \
| grep -v black
# ImageMagick pixel enumeration: 400,1,255,srgb
76,0: (255,255,255) #FFFFFF white
78,0: (255,255,255) #FFFFFF white
322,0: (255,255,255) #FFFFFF white
324,0: (255,255,255) #FFFFFF white
convert canny-edges.png -resize 1x300\! -threshold 50% txt:- \
| grep -v black
# ImageMagick pixel enumeration: 1,300,255,srgb
0,39: (255,255,255) #FFFFFF white
0,41: (255,255,255) #FFFFFF white
0,229: (255,255,255) #FFFFFF white
0,231: (255,255,255) #FFFFFF white
From the above results, it can be inferred that the four pixel coordinates are a note labeled inside another image:
- bottom left corner:
(323|40)
- top right corner:
(77|230)
The area is 246 pixels wide and 190 pixels high.
(ImageMagick takes the system origin in the upper left corner of the image.)
Now cut a note from the original image, which you can do:
convert http://i.stack.imgur.com/SxrwG.png[246x190+77+40] sticky-note.png
Additional search parameters
autotrace
You can optimize the above procedure (even convert it to an auto-running script if you like) even more by converting the intermediate "canny-edge.png" to SVG vector graphics, for example by running it autotrace
...
This can be useful if your sticky note is tilted or rotated.
Hough line detection
Once you have the "canny" lines, you can also apply the Hough Line Detection algorithm:
convert \
canny-edges.png \
-background black \
-stroke red \
-hough-lines 5x5+20 \
lines.png
Note that the operator -hough-lines
expands and prints the detected lines from one edge (floating point) to the other edge of the original image.
While the previous command finally converts the strings to PNG, the operator -hough-lines
actually generates an MVG (Magick Vector Graphics) file internally. This means that you can actually read the source code of the MVG file and determine the mathematical parameters of each line shown in the image of the "red lines":
convert \
canny-edges.png \
-hough-lines 5x5+20 \
lines.mvg
This is a more complex process and also works for edges that are not strictly horizontal and / or vertical.
But your image image uses horizontal and vertical edges, so you can even use simple shell commands to detect them.
In total, the generated MVG file contains 80 line descriptions. You can identify all horizontal lines in this file:
cat lines.mvg \
| while read a b c d e ; do \
if [ x${b/0,/} == x${c/400,/} ]; then \
echo "$a $b $c $d $e" ; \
fi; \
done
line 0,39.5 400,39.5 # 249
line 0,62.5 400,62.5 # 48
line 0,71.5 400,71.5 # 52
line 0,231.5 400,231.5 # 249
Now select all the vertical lines :
cat lines.mvg \
| while read a b c d e; do \
if [ x${b/,0/} == x${c/,300} ]; then \
echo "$a $b $c $d $e" ; \
fi; \
done
line 76.5,0 76.5,300 # 193
line 324.5,0 324.5,300 # 193
source to share
Last week I faced a similar problem of detecting image boundaries (gaps) and spent many hours trying different approaches and tools, and after that I finally solved it with the entropy difference approach, so JFYI is the algorithm here.
Let's say you want to determine if an image has a 200x100px top border:
- Get top of image 25% height (25px) (0: 25, 0: 200)
- Get the bottom piece with the same height starting at the top end of the piece and deeper to the center of the image (25: 50, 0: 200).
shows the upper and lower parts
- Calculate entropy for both parts
- Find the difference in entropy and store it at the current block height.
- Make the top 1px smaller (24px) and repeat from p.2 until we hit the edge of the image (height 0) - resizing the scan area each iteration, thus sliding towards the edge of the image.
- Find the maximum stored differences in entropy and its block height - this is the center of our border if it lies closer to the edge, not to the center of the image, and the maximum difference in entropy is above a given threshold (0.5, for example)
And apply this algorithm to all sides of your image.
Here is a code snippet to determine if the image has an upper bound and find the approximate coordinate (top offset), pass grayscale ("L" mode) Pillow image for the scan function :
import numpy as np
MEDIAN = 0.5
def scan(im):
w, h = im.size
array = np.array(im)
center_ = None
diff_ = None
for center in reversed(range(1, h // 4 + 1)):
upper = entropy(array[0: center, 0: w].flatten())
lower = entropy(array[center: 2 * center, 0: w].flatten())
diff = upper / lower if lower != 0.0 else MEDIAN
if center_ is None or diff_ is None:
center_ = center
diff_ = diff
if diff < diff_:
center_ = center
diff_ = diff
top = diff_ < MEDIAN and center_ < h // 4, center_, diff_
Full source code with examples of rendered borders and transparent (borderless) images is given below: https://github.com/embali/enimda/
source to share