How do people prove the correctness of computer vision methods?

I would like to present some abstract questions about computer vision. I was unable to answer these questions by searching the internet and reading the docs.

  • How does anyone know if a computer vision algorithm is correct?
  • How to define "correct" in the context of computer vision?
  • Does formal proof play a role in understanding the correctness of computer vision algorithms?

A bit of background: I am about to start a PhD in Computer Science. I enjoy developing fast parallel algorithms and proving the correctness of those algorithms. I've also used OpenCV from some cool projects, although I haven't had a lot of formal computer vision training.

We have been approached by a potential thesis consultant who is working on developing faster and more scalable algorithms for computer vision (e.g. fast image segmentation). I am trying to understand common methods for solving computer problems.

+3


source to share


4 answers


In practice, computer vision is more like an empirical science: you collect data, think about simple hypotheses that might explain some aspects of your data, and then test those hypotheses. Usually you don't have a clear definition of "correct" for high-level CV tasks like face recognition, so you can't prove correct.

Low-level algorithms are a different matter: you usually have a clear mathematical definition of "correct" here. For example, if you devised an algorithm that could compute an average filter or morphological operation more efficiently than known algorithms, or that could be parallelized better, you would of course need to prove it correct, just like any other algorithm.



Also, there are often certain requirements for a computer vision algorithm that can be formalized: for example, you may want your algorithm to be invariant to rotation and translation - these are properties that can be formally proven. Sometimes it is also possible to create mathematical models of signal and noise, and design the filter that has the best signal-to-noise ratio (IIRC - Wiener filter or Canny edge detector).

Many image processing / computer vision algorithms have some sort of "repeat until convergence" cycle (eg snakes or Navier-Stokes inpainting and other PDE based methods). You would at least try to prove that the algorithm converges for any input.

+2


source


You just don't prove them.



Instead of a formal proof, which is often impossible to do, you can test your algorithm against a set of test boxes and compare the output with previously known algorithms or correct answers (for example, when you recognize text, you can generate a set of images where you know what the text says) ...

+3


source


This is my personal opinion, so take it for what it's worth.

You cannot prove most Computer Vision methods are correct right now. I regard most of the existing methods as a "recipe" where ingredients are discarded until the "result" is good enough. Can you prove the cake pie is correct?

This is a bit like how machine learning has evolved. At first, people were doing neural networks, but it was just a big "soup" that worked more or less. He worked sometimes, not other times, and no one knew why. Then statistical training began (via Vapnik among others), with some real math backups. You could prove that you had a unique hyperplane that minimized a certain loss function, PCA gives you the closest fixed rank matrix to a given matrix (given the Frobenius norm I believe), etc.

Now, there are still a few things that are "right" in computer vision, but they are rather limited. What comes to my mind is wavelet : they are the most diffuse representation in the orthogonal basis of a function. (i.e. the most concise way to represent the approximation of an image with minimal error)

+2


source


Computer vision algorithms are not like theorems that you can prove, they usually try to interpret image data in terms that are more understandable to us humans. Like face recognition, motion detection, video surveillance, etc. The exact correctness is not calculated, as is the case with image compression algorithms, where you can easily find the result by the size of the images. The most common methods used to show results in computer vision methods (especially for classification) are Vs recall accuracy graphs, Vs accuracy of false positives. They are measured in standard databases available on different sites. Generally, the more severely you set the parameters for correct detection, the more false positives you create.Typical practice is to select a point from the graph according to your requirement of "how many false positives is acceptable for the application".

+1


source







All Articles