Human inspector versus programmer (was: Algorithm for determining if any circles overlap)

Problem

This question did come up at work today. We are planning an experiment where we will show a series of cards for users. Each card has 31 symbols on it. Each symbol also has a label. We noticed that in some cases the labels overlap, which makes one of the labels unreadable.

We ended up defining symbolic problems in the old-fashioned way - by going through each map every time and recording the ID of any problem symbols we found - but I thought this problem could be solved quite easily with an algorithm. It took about one hour to visually check all the maps (using an admittedly clunky experiment data collection tool).

I am wondering how quickly people on this site can solve this problem, and I am also interested in what algorithms you have come up with. (Note: This is not a homework question, although I think it will make an interesting homework or interview question.)

Specifications

  • Cards: 24
  • Map size (pixels): 1024 X 768
  • Symbols (circles) per map: 31
  • Symbol Diameter (pixels): 60

Symbolic coordinates are stored in a spreadsheet table (assuming a tab-delimited text file) with the following columns:

  • MapId

    (range: 1 - 24)
  • SymbolId

    (range: 1 - 744 (24 cards x 31 characters / card = 744 common characters)
  • XCoordinate

    (range: 0 - 1024)
  • YCoordinate

    (range: 0 - 768)

Suppose all four columns contain integers

.

purpose

How quickly can you find an algorithm (language of your choice) that:

  • Reads a tab-delimited text file containing the input.
  • Determines, for each card, if any symbols overlap.
  • If any characters overlap, reports that SymbolId

    violate

Your answer should contain

  • Your algorithm that should satisfy all three goals (above).
  • How long did it take you to think about it and write it (you are in your honor). Note: you don't need to count the time it took you to read and understand the question, but as soon as you start thinking about ideas for a solution, start the clock.

I'm not interested in how fast your algorithm is or how efficient it is in using memory. I am looking for a quick and dirty, but accurate and reliable solution to a problem.

+2


source to share


5 answers


Taking the authority of SQL in your spreadsheet software, I ran it through a query:

select 
   s1.symbolId,s2.symbolId
from 
   symbols s1 
   join 
   symbols s2 
where 
   s1.mapid=s2.mapid 
   and 
   s1.symbolid<s2.symbolid 
   and 
   ((s1.xcoordinate-s2.xcoordinate)*
   (s1.xcoordinate-s2.xcoordinate)+
   (s1.ycoordinate-s2.ycoordinate)*
   (s1.ycoordinate-s2.ycoordinate))
   <(r+r)*(r+r)

      



(about 5 minutes, probably with a few errors)

+2


source


For each display:

If distance(A.center, B.center) < (A.radius+B.radius)
   the circles overlap.

      

In your case, it seems like each circle has the same radius, but I allowed the possibility of each circle having a different radius, just in case.

As for how long it takes to come up with this, it's a little tricky to say for sure - less time than it took to load the full description page. Then I had to read some things to confirm that the main problem was that it was from the title ...



Edit: If you had a lot of circles, it might be worth doing some sorting to eliminate unnecessary overlap tests, but assuming only 30 circles per map, which probably isn't worth it - you'll need a truly ancient computer to do this it will take a whole second.

Sorry, had to leave a little to take the kids to the hockey game. Anyway, although I haven't tested it, here's what I'll be doing for a C ++ program in about 10-12 minutes (I needed to call there, so I don't have the exact time):

#include <vector>
#include <math.h>
#include <iostream>

struct point { 
    int id, x, y;
};

std::istream &operator>>(std::istream &is, point &p) { 
    return is >> p.id >> p.x >> p.y;
}

typedef std::vector<point> map;

double dist(point const &a, point const &b) { 
    double x=a.x-b.x;
    double y=a.y-b.y;
    return sqrt(x*x+y*y);
}

int main() { 
    std::vector<map> maps(24);
    int mapID;
    while (std::cin>> mapID) {
        point temp;
        std::cin >> temp;
        maps[mapID].push_back(temp);
    }

    for (int i=0; i<maps.size(); i++)
        for (int j=0; j<maps[j].size(); j++) 
            for (int k=j; k<maps[j].size(); k++) 
                if (dist(maps[i][j], maps[i][k]) < 60)
                    std::cout 
                        << "map " << i << "\t" 
                        << maps[i][j].id 
                        << " overlaps with " 
                        << maps[i][k].id << "\n";
    return 0;
}

      

I haven't actually tested this, so it might take 5 to 10 minutes (or something in that order) to make sure it works correctly, but I wouldn't expect anything much longer. Anyway, I would expect one person to finish within an hour. If you're used to something like AWK or Perl, I'd rather finish a little more quickly - although to be honest it's trivial enough that it basically boils down to shortening typing ...

+8


source


My first thought is a simple O (N 2 ) algorithm that Jerry has already posted, with one small exception: distance 2 is easier to compute since it doesn't require square root, so compare that to (sum of radii) 2 .

My second thought is to improve this solution with a quadtree containing all of your points, which helps reduce the number of comparisons to be made.

It took longer to write this answer than to come up with any of these solutions - perhaps 10 seconds. This kind of problem is very common in computer graphics, I just rip off what I've seen before.


Quickly written implementation of solution 1 in Haskell (took a minute):

import Control.Monad
import Data.List
radius = 30
main = do
    points <- fmap (map (map read . words) . lines) getContents
    mapM_ print $ overlapping (points :: [[Int]])
overlapping list = do
    [m1, s1, x1, y1]:rest <- tails list
    [m2, s2, x2, y2] <- rest
    guard $ m1 == m2 && (x2-x1)^2 + (y2-y1)^2 < 2*radius^2
    return ((m1, s1), (m2, s2))

      

With a given task that is less than 3e5 comparisons; it's probably not worth recording quadrant insertion / search. Even if the comparison took 3000 clock cycles (which obviously wasn't), it would still end in a second.

+4


source


Here's something faster than O (n ^ 2) where n = number of characters. Use a matrix of dynamic STL arrays vector

to represent each pixel in the image. Calculate the pixels occupied by each circle and push

its SymbolID

into an array of each pixel in the circle. Once you're done with all the circles, just look at each pixel, and if any pixel contains more than one character in its array, you know which ones SymbolID

are of violation.

Time: ~ 5 minutes

The UPDATE: . With the new requirements, my original algorithm could be slightly modified to read on each line, update the pixel- map SymbolID

I defined above for each character, and move on to the next map.

+1


source


You can also try Sweep and prune if you're looking for ways to speed up your response.

0


source







All Articles