Graph shortening algorithm type for 3D reconstruction

I have read several articles about using graph contractions for 3D reconstruction and I noticed that there seem to be two alternative approaches to posing this problem.

One approach is volumetric and describes a 3D voxel region that uses graph reduction to output a binary label (whether it contains an object of interest or not) for each voxel. Papers that use this approach include Multi-View Stereo through Volume Graphs and Occlusion Reliable Photo Consistency and Surface Reconstruction using Global Graph Optimization .

The second approach is 2D and aims to tag every pixel of the reference image with the depth of the 3D point that works there. Papers that use this approach include Calculating Visual Correspondence with Occlusions Using Graph Abbreviations .

I want to understand the advantages / disadvantages of each method and which are most important when choosing which method to use. So far I understand that some of the benefits of the first approach are:

  • This is a binary problem, so it can be solved exactly with the Max-Flow algorithms.
  • Provides simple techniques for modeling occlusion.

And some advantages of the second approach:

  • Smaller neighbor for each node of the graph.
  • Easier to model smoothness (but does it give better results?).

Also, I would be wondering in what situations I would be better off choosing one view or the other and why.

+3


source to share


1 answer


The most significant differences are the type of scenes with which algorithms are typically used and the way the 3D shape of an object is represented .

Voltmetric approaches work best ...

  • with a lot of images ...
  • taken from different points of view, well distributed around the object, ...
  • a more or less compact "object" (for example, an artifact, on the contrary, for example, on an outdoor scene observed by a vehicle camera).

Volumetric approaches are popular for recovering "objects" (especially artifacts). Given sufficient representations (i.e. images), the algorithms provide a complete volumetric (i.e. voxel) representation of the object's shape. This can be converted to surface representation using marching cubes or a similar method.

The second type of algorithms identified are called stereo algorithms , and graph shortening is just one of many ways to solve such problems. Stereo is better ...

  • If you only have two images ...
  • with a rather narrow base (i.e. distance between cameras)


There are generalizations to more than two images (with narrow baselines), but most of the literature is related to binocular (i.e. two images). Some algorithms generalize more easily to more views than others.

Stereo algorithms only provide a depth map, that is, an image with depth for each pixel. This prevents you from "traversing" the object. There are, however, 3D reconstruction systems that start with stereo on pairs of images and combine depth maps to get an idea of ​​the full object, which is a non-trivial problem on your own. Interestingly, it is often referred to using 3D as an intermediate step.

Stereo algorithms can be and are often used for "scenes", for example. expensive, watched by a couple of cameras in a car or people in a 3D video conference room.

Some concluding remarks

  • For stereophonic and volumetric reconstruction, graph contractions are just one of several ways to solve the problem. Stereo, for example, can also be formulated as a continuous optimization problem rather than discrete, which implies other optimization methods to solve it.
  • My answer contains a bunch of generalizations and simplifications. This does not mean that this is the final attitude to this issue.

I don't necessarily agree that smoothness is easier in stereo. Why do you think so?

+4


source







All Articles