Creating a 3D representation of an object using a webcam

Is it possible to make a 3D representation of an object by capturing many different angles with a webcam? If so, how is this possible and how is image processing done?

My plan is to make a 3D view of a person using a webcam, and then from a 3D view, I can tell personal life statistics.


source to share

4 answers

As Bart said (but did not post as a real answer) this is quite possible.

The research topic you are interested in is often called multi-link stereo or something similar.

The basic idea is decided around using point matches between two (or more) images, and then tries to find the best matching camera positions. Once positions are found, you can use stereo algorithms to back-propose image points into a 3D coordinate system and generate a point cloud.

From this cloud, you can then process it to get the measurements you want.

If you are completely new to the subject, you have a fascinating reading to look forward to!

Barthes suggested multiple geometry of vision by Hartley and Zisserman, which is a very good book indeed.



As noted by Bart and Kigurai, this process is studied under the name "stereo" or "multi-stereo stereo". To be able to get a 3D model from a set of images, you need to do the following:

a) You need to know the "internal" parameters of the camera. This includes the focal length of the camera, the main point of the image, and accounting for the radial distortion of the image. b) You also need to know the position and orientation of each camera relative to each other or the "world" coordinate system. This is called "pose" the camera.

There are algorithms for performing (a) and (b), which are described in the book by Hartley and Zisserman "Multiple Geometry of the Survey". Alternatively, you can use the Noah Snavely "Bundler" software to do the same in a very reliable way too.

Once you have the camera options, you know how the 3D point (X, Y, Z) in the world maps to the image coordinate (u, v) in the photo. You also know how to map image coordination to the world. You can create a dense cloud of points by looking for a match for every pixel in one photo in a photo taken from a different perspective. This requires a two-dimensional search. You can simplify this procedure by making the search one-dimensional. This is called "fixing". You are essentially taking two photographs and transforming them so that their lines match the same line in the world (simplified operator). Now you only need to search for image strings.

An algorithm for this can also be found in Hartley and Sisserman.

Finally, you need to do the matching based on some measure. There is a lot of literature on "stereo matching". Another word is "non-conformity assessment". It's basically a search for a pixel match (u, v) in one photo with its match (u, v ') in another photo. Once you get a match, the difference between the two can be used to map back to a 3D point.

You can use Yasutaka Furukawa's "CMVS" or "PMVS2" software for this. Or, if you want to experiment on your own, openCV is an open source CCTV toolkit for performing many of the sub-tasks required to do so.



This can be done with two webcams in the same way your eyes work. This is called stereoscopic vision. Look at this:

An affordable alternative for acquiring 3D data is the Kinect camera.



Maybe not the answer you are hoping for, but Microsoft Kinect does what it does, there are some open source drivers that allow it to be plugged into a window / linux.



All Articles