How to define PHOW functions for an image in C ++ using vlfeat and opencv?

I have implemented PHOW function detector in Matlab like this:

    [frames, descrs] = vl_phow(im);

      

which is a wraper to the code:

    ...
    for i = 1:4
        ims = vl_imsmooth(im, scales(i) / 3) ;
        [frames{s}, descrs{s}] = vl_dsift(ims, 'Fast', 'Step', step, 'Size', scales(i)) ;
    end
    ...

      

I am doing a C ++ implementation with opencv and vlfeat. This is part of my implementation code for calculating PHOW functions for an image (Image Mat):

   ...
   //convert into float array
   float* img_vec = im2single(image);

   //create filter
   VlDsiftFilter* vlf = vl_dsift_new(image.cols, image.rows);

   double bin_sizes[] = { 3, 4, 5, 6 };
   double magnif = 3;
   double* scales = (double*)malloc(4*sizeof(double));
   for (size_t i = 0; i < 4; i++)
   {
       scales[i] = bin_sizes[i] / magnif;
   }
   for (size_t i = 0; i < 4; i++)
   {
       double sigma = sqrt(pow(scales[i], 2) - 0.25);

       //smooth float array image 
       float* img_vec_smooth = (float*)malloc(image.rows*image.cols*sizeof(float));
       vl_imsmooth_f(img_vec_smooth, image.cols, img_vec, image.cols, image.rows, image.cols, sigma, sigma);

       //run DSIFT
       vl_dsift_process(vlf, img_vec_smooth);

       //number of keypoints found
       int keypoints_num = vl_dsift_get_keypoint_num(vlf);

       //extract keypoints
       const VlDsiftKeypoint* vlkeypoints = vl_dsift_get_keypoints(vlf);

       //descriptors dimention
       int dim = vl_dsift_get_descriptor_size(vlf);

       //extract descriptors
       const float* descriptors = vl_dsift_get_descriptors(vlf);
   ...

   //return all descriptors of diferent scales

      

I'm not sure if the return should be a collection of all handles for all scales, which requires a lot of storage space when we are processing multiple images; or the result of an operation between descriptors of different scales. Can you help me with this doubt? Thanks to

+3


source to share


1 answer


You can do it. The simplest thing would be to simply combine the different levels. I believe this is what VLFeat does (at least they don't say they do more in the documentation). Removing those lower contrast thresholds should help, but you will still have several thousand (depending on the size of your image). But you can compare descriptors found near the same location to trim some of them. It's a bit of a space-time trade-off. Typically I've seen spaced block sizes (at intervals of 2, but may be larger), which should reduce the need to check for overlapping descriptors.



+1


source







All Articles