Performing phase correlation with fft in R

I am trying to implement a 2d phase correlation algorithm in R using a recipe from Wikipedia ( http://en.wikipedia.org/wiki/Phase_correlation ) to track movement between two images. These images (frames) were captured by a camera shaking in the wind, and the ultimate goal is to remove the jitter in these and subsequent frames. Below are two example images and R code:

frame 1frame 2

## we will need the tiff library 
library(tiff)

## read in the tiff files 
f1=as.matrix(readTIFF('f1.tiff',native=TRUE))
f2=as.matrix(readTIFF('f2.tiff',native=TRUE))

## take the fft of the first  frame
F1 <- fft(f1)
## take the Conjugate fft of the second frame
F2.c <- Conj(fft(f2))

## calculate the cross power spectrum according to the wiki article
R <- (F1*F2.c)/abs(F1*F2.c)
## take the inverse fft of R
r <- fft(R,inv=TRUE)/length(R)
## because the zero valued imaginary numbers are not needed
r <- Re(r)

## show the normalized cross-correlation
image(r)

## find the max in the cross correlation matrix, or the phase shift -
## between the two images
shift <- which(r==max(r),arr.ind=TRUE)

      

The vector shift, as far as I know, should contain information about the transient shift (dx and dy) that best corrects these two images. However, the shift variable gives dx = 1 and dy = 1, which I believe does not mean a shift in either the x-direction or the y-direction. This occurs for subsequent frames where there are visible offsets or multiple pixels in both x and y directions.

Will any of y'all make a mistake in my code / formulas? Or do I need to try something weirder like filtering images before I do phase correlation?

Cheers and guys!

+3


source to share


2 answers


The code looks correct from what I know about phase correlation. If I understand correctly what you want, you are trying to use phase correlation to determine the displacement between two images, given that their homographies are nothing more than horizontal and vertical displacements. The fact that you only get a shift at the origin is most likely due to the fact that your images do not have enough high frequency information to correctly identify a good shift.

Try these two images (these were from the Wikipedia article you linked to, but I extracted them and saved them as separate images):

Image # 1Image # 2

When I run these two images with your R code, I get this for my phase correlation map. Keep in mind that your images were actually saved as .png

, so I had to change the library to library(png)

and I used readPNG

instead readTIFF

. Keep this in mind when trying to run your code with the above images:

enter image description here

In addition, the location where the maximum peak occurred was:



> shift
     row col
[1,] 132 153

      

This indicates that the image is shifted 132 lines and 153 columns. Note that this is relative to the center of the image. If you want to determine the actual offset, you need to subtract that by half the rows for the vertical coordinate and half the columns for the horizontal coordinate.

So the code works perfectly fine ... it's just that your images don't have a high enough frequency of information for the phase correlation to work. In this case, some kind of correlation lies in the fact that we are trying to find "similar" options between each image. If there are many variations between each image and are very similar, then phase correlation will work well. However, if we don't have that many variations, then phase correlation won't work.

Why is this so? The basis of phase correlation is that we assume that the image is corrupted by Gaussian white noise, and so if we match the white noise to itself (from one image to the next) it will give a very nice high peak where the offset or shift is almost everywhere zero. Due to the lack of high frequency information in your images and the fact that the images are clear, phase correlation won't actually work. So what some people actually suggest is to pre-whiten your image so that the image contains white noise so you can get a nice peak where the offset we're talking about should be.

However, to make sure that you eliminate any false highs, it's also a good idea to flatten the cross-correlation matrix in the frequency domain ( r

in your R-code) so that there is a high probability that there will be only one true maximum. Using a Gaussian filter in the frequency domain / FFT should work fine.

Anyway, I don't see much difference in your images, and so something to take away from you is that you have to make sure your image has a lot of high frequency information for this to work!

+4


source


The following is a qualitative description of the procedure, followed by the R code used to efficiently and reliably find the phase correlation between the two images posted in the original question. Thanks to @rayreng for advice and pointing me in the right direction!

  • Read both images
  • Add noise to the second image
  • Convert both images to frequency spectrum using fft
  • folding transformed images using multiplication
  • return to spatial domain with reverse fft. This is your normalized cross-correlation.
  • Change the normalized cross-correlation matrix so that the zero frequency is in the middle of the matrix (similar to fftshift in matlab)
  • Use a 2d Gaussian distribution to smooth the normalized cross-correlated matrix
  • Determine the shift by determining the maximum indexed value of the smoothed normalized correlated matrix
  • Be aware that this function also uses its own 2d Gaussian generator (see below) and a custom function similar to matlabs fftshift function.

     ### R CODE ###
     rm(list=ls())
     library(tiff)
     ## read in the tiff images 
     f1 <- readTIFF('f1.tiff',native=TRUE)
     f1 <- matrix(f1,ncol=ncol(f1),nrow=nrow(f1)) 
    
    
     ## take the fft of f1 
     F1 <- fft(f1)
    
     ## what is the subsequent frame?
     f2 <-readTIFF('f2.tiff',native=TRUE)
     f2 <- matrix(f2,ncol=ncol(f2),nrow=nrow(f2))
    
     ## create a vector of random noise the same length as f2
     noise.b <- runif(length(f2),min(range(f2)),max(range(f2)))
     ## add the noise to the f2
     f2 <- noise.b+f2   
    
    ## take the fft and conjugate of the f2
    F2.c <- Conj(fft(f2))
    
    ## calculate the cross-power spectrum
    R <- (F1*F2.c)/abs(F1*F2.c)
    ## calculate the normalized cross-correlation with fft inverse
    r <- fft(R,inv=TRUE)/length(R)
    ## rearrange the r matrix so that zero frequency component is in the -
    ## middle of the matrix.  This is similar to the function - 
    ## fftshift in matlab 
    
    fftshift <- function(x) {
    if(class(x)=='matrix') {
        rd2 <- floor(nrow(x)/2)
        cd2 <- floor(ncol(x)/2)
    
        ## Identify the first, second, third, and fourth quadrants 
        q1 <- x[1:rd2,1:cd2]
        q2 <- x[1:rd2,(cd2+1):ncol(x)]
        q3 <- x[(rd2+1):nrow(x),(cd2+1):ncol(x)]
        q4 <- x[(rd2+1):nrow(x),1:cd2]
    
        ## rearrange the quadrants 
        centered.t <- rbind(q4,q1)
        centered.b <- rbind(q3,q2)
        centered <- cbind(centered.b,centered.t)
    
        return(Re(centered))             
    }
    if(class(x)!='matrix') {
        print('sorry, this class of input x is not supported yet')
        }
    }
    
    ## now use the defined function fftshift on the matrix r
    r <- fftshift(r)
    r <- Re(r)
    
    ## try and smooth the matrix r to find the peak!
    ## first build a 2d gaussian matrix  
    sig = 5 ## define a sigma 
    
    ## determine the rounded half dimensions of the r matrix 
    x.half.dim <- floor(ncol(r)/2)
    y.half.dim <- floor(nrow(r)/2)
    
    ## create x and y vectors that correspond to the indexed row
    ## and column values of the r matrix.  
    xs <- rep(-x.half.dim:x.half.dim,ncol(r))
    ys <- rep(-y.half.dim:y.half.dim,each=nrow(r))
    
    ## apply the gaussian blur formula 
    ## (http://en.wikipedia.org/wiki/Gaussian_blur)
    ## to every x and y indexed value
    gaus <- 1/(2*pi*sig^2) * exp(-(xs^2 + ys^2)/(2*sig^2))
    gaus <- matrix(gaus,ncol=ncol(r),nrow=nrow(r),byrow=FALSE)
    
    ## now convolve the gaus with r in the frequency domain
    r.filt <- fft((fft(r)*fft(gaus)),inv=TRUE)/length(r)
    r.filt <- fftshift(Re(r.filt))
    
    ## find dx and dy with the peak in r    
    min.err <- which(r.filt==max(r.filt),arr.ind=TRUE)
    
    ## how did the image move from the previous? 
    shift <- (dim(f1)+3)/2-min.err
    
          



A vector shift indicates that the image is shifted in the positive x direction and in the negative y direction. In other words, the second image (f2) has been moved roughly up to the right. The vector shift values ​​will change slightly with each attempt due to noise injected into the second image, together with the smoothing operator from the Gaussian filter on the r-matrix. I've noticed that phase correlation similar to the one described above works better on large images / matrices. To see the results of the above algorithm, visit the stabilized video at https://www.youtube.com/watch?v=irDFk2kbKaE .

+2


source







All Articles