OpenCV: design matrices from fundamental matrix and SolvePnPRansac are completely different

As part of my master's thesis, I am studying Structure From Motion. After reading parts of the H&Z book , after online tutorials and reading through many SO posts, I have some useful results, but I also have some problems. I am using the OpenCVSharp wrapper. All images are taken from one camera.

What I have now:


I first calculate the starting coordinates of the three points. I do it with the following steps:

  • Calculate the Farneback Dense Optical Flow.
  • Find the main matrix using Cv2.FindFundamentalMat with RANSAC
  • Get the main matrix using the built-in cameras (I'm using predefined functions at this point) and decompose it:

    Mat essential = camera_matrix.T() * fundamentalMatrix * camera_matrix;
    
    SVD decomp = new SVD(essential, OpenCvSharp.SVDFlag.ModifyA);
    
    Mat diag = new Mat(3, 3, MatType.CV_64FC1, new double[] {
        1.0D, 0.0D, 0.0D,
        0.0D, 1.0D, 0.0D,
        0.0D, 0.0D, 0.0D
    });
    
    Mat Er = decomp.U * diag * decomp.Vt;
    
    SVD svd = new SVD(Er, OpenCvSharp.SVDFlag.ModifyA);
    
    Mat W = new Mat(3, 3, MatType.CV_64FC1, new double[] {
        0.0D, -1.0D, 0.0D,
        1.0D, 0.0D, 0.0D,
        0.0D, 0.0D, 1.0D
    });
    
    Mat Winv = new Mat(3, 3, MatType.CV_64FC1, new double[] {
        0.0D, 1.0D, 0.0D,
        -1.0D, 0.0D, 0.0D,
        0.0D, 0.0D, 1.0D
    });
    
    Mat R1 = svd.U * W * svd.Vt;
    Mat T1 = svd.U.Col[2];
    Mat R2 = svd.U * Winv * svd.Vt;
    Mat T2 = -svd.U.Col[2];
    
    Mat[] Ps = new Mat[4];
    
    for (int i = 0; i < 4; i++)
        Ps[i] = new Mat(3, 4, MatType.CV_64FC1);
    
    Cv2.HConcat(R1, T1, Ps[0]);
    Cv2.HConcat(R1, T2, Ps[1]);
    Cv2.HConcat(R2, T1, Ps[2]);
    Cv2.HConcat(R2, T2, Ps[3]);
    
          

  • Then I check which projection matrix has the most points in front of both cameras by triangulating the points and then multiplying them by the projection matrices (I've tried both Cv2.TriangulatePoints and the H & Z version with similar results) and checking for positive Z values ​​(after conversions from homogeneous values):

    P * point3D
    
          

  • At this point, I should have more or less correct 3D points. 3D rendering looks quite correct.

Then I calculate SolvePNP for each new frame, using dense optical flow again and with a known prediction matrix. I calculate the following 3D points and add them to the model. Again the 3D rendering looks more or less correct (no package setup is in progress at this time).

Since I need to use SolvePNP for every new frame, I started checking it with one calculated for the first two images with a fundamental matrix. In theory, the projection matrix should be the same or nearly the same as calculated with the original algorithm. I am using the starting 3D points and the corresponding 2D points in the second image. But this is not the same.

Here the decomposition of the fundamental matrix is ​​calculated:

0,955678480016302 -0,0278536127242155 0,293091827064387 -0,148461857222772 
-0,0710609269521247 0,944258717203142 0,321443338158658 -0,166586733489084 
0,285707870900394 0,328023857736121 -0,900428432059693 0,974786098164824 

      

And here's the one I got from SolvePnPRansac:

0,998124823499476 -0,0269266503551759 -0,0549708305812315 -0,0483615883381834 
0,0522887223187244 0,8419572918112 0,537004476968512 -2,0699592377647 
0,0318233598542908 -0,538871853288516 0,841786433426546 28,7686946357429

      

They both look like correct projection matrices, but they are different.

For those patients reading the entire post, I have 3 questions:

1. Why are these matrices different? I know the reconstruction is up to scale, but since I have an arbitrary scale assigned in the first place the SolvePNP should keep that scale.
2. I noticed one strange thing - the translation in the first matrix seems to be exactly the same no matter what images I use.
3. Is the overal algorithm correct, or am I doing something wrong? Do I miss some important step?

      

If more code is required, let me know and I will edit the question.

Thank!

+3


source to share


3 answers


To start with , there is one obvious reason why the two approaches you described are unlikely to provide the same projection matrices: they both estimate their results using RANSAC, which is a randomness-based algorithm, since both approaches randomly choose some of the matches for evaluating a model that fits most of them, the result depends on the chosen matches.

Hence, you cannot expect to get exactly the same projection matrices with both approaches. However, if all goes well, they should be pretty close, which doesn't seem to be the case. The two matrices you have provided have excellent translations, indicating that there is probably a more serious problem .

First , the fact that "the translation in the first matrix seems to be exactly the same no matter what images I use" seems to me to be a strong clue that there might be a bug in your implementation, I would suggest looking into this in detail first ...



Second , I don't think using optical flow in the flow of a structure out of motion is appropriate. Indeed, optical flow requires the two images in question to be very close (for example, two consecutive frames of video), whereas three-dimensional triangulation of the corresponding points in the two images requires a large baseline to be accurate. These two requirements are conflicting, which can lead to problems and inaccuracies in the results, which explains the different results of the two approaches.

For example, if the two images you are looking at are two consecutive video frames, you will not be able to accurately determine the triangulation points, which can lead to the selection of the wrong projection matrix in step 4 and may also cause the SolvePnP

wrong projection matrix to be estimated. On the other hand, if the two images you think have a large baseline, the triangulation will be accurate, but there will likely be many inconsistencies in Optical Flow, leading to errors throughout the workflow.

One thing you could do to understand where your problems are coming from is to use synthetic data with known projection matrices and 3D points. Then you can analyze the accuracy of each step and check if they generate the expected results.

+5


source


I am writing to let everyone know that I have not achieved a solution to this problem. I used basic matrix initial triangulation and not SolvePnP, even though the results were incorrect. This is not a perfect solution, but sometimes it works. That was enough for my whole project to be accepted and finished for me :)



+1


source


I know I am a little late to this party, but I would like to point out the fundamental difference between your two approaches. The pose camera you get from the main matrix to scale, while the pose camera you get from solvePnP should be in world units. In other words, the translation vector you get from the main matrix is ​​a unit vector, and the translation vector magnitude that you get from solvePnP should be close to the actual distance between the camera and the origin.

+1


source







All Articles