Differences between CV2 image processing and tf.image processing

I recently disabled the use of cv2 for the Tensorflow tf.image module for image processing. However, my validation accuracy dropped by about 10%.

I believe the problem is with

  • cv2.imread () vs. tf.image.decode_jpeg ()
  • cv2.resize () vs. tf.image.resize_images ()

While these differences result in poorer accuracy, images appear indistinguishable to humans when using plt.imshow (). For example, take Image # 1 from ImageNet's validation dataset:

CV2 Image enter image description here

First edition:

  • cv2.imread () takes a string and outputs a 3 channel uint8 BGR matrix
  • tf.image_decode_jpeg () takes a string tensor and yields a uint8 RGB tensor with 3 channels.

However, after converting the tf tensor to BGR format, there are very minor differences in many pixels in the image.

Using tf.image.decode_jpeg and then converting to BGR

[[ 26  41  24 ...,  57  48  46]
 [ 36  39  36 ...,  24  24  29]
 [ 41  26  34 ...,  11  17  27]
 ..., 
 [ 71  67  61 ..., 106 105 100]
 [ 66  63  59 ..., 106 105 101]
 [ 64  66  58 ..., 106 105 101]]```

      

Using cv.imread

[[ 26  42  24 ...,  57  48  48]
 [ 38  40  38 ...,  26  27  31]
 [ 41  28  36 ...,  14  20  31]
 ..., 
 [ 72  67  60 ..., 108 105 102]
 [ 65  63  58 ..., 107 107 103]
 [ 65  67  60 ..., 108 106 102]]```

      

Second problem:

  • tf.image.resize_images () automatically converts the uint8 tensor to the float32 tensor and seems to exacerbate the differences in pixel values.
  • I believe tf.image.resize_images () and cv2.resize () are

tf.image.resize_images

[[  26.           25.41850281   35.73127747 ...,   81.85855103
    59.45834351   49.82373047]
 [  38.33480072   32.90485001   50.90826797 ...,   86.28446198
    74.88543701   20.16353798]
 [  51.27312469   26.86172867   39.52401352 ...,   66.86851501
    81.12111664   33.37636185]
 ..., 
 [  70.59472656   75.78851318 
 45.48100662 ...,   70.18637085
    88.56777191   97.19295502]
 [  70.66964722   59.77249908   48.16699219 ...,   74.25527954
    97.58244324  105.20263672]
 [  64.93395996   59.72298431   55.17600632 ...,   77.28720856
    98.95108032  105.20263672]]```

      

cv2.resize

[[ 36  30  34 ..., 102  59  43]
 [ 35  28  51 ...,  85  61  26]
 [ 28  39  50 ...,  59  62  52]
 ..., 
 [ 75  67  34 ...,  74  98 101]
 [ 67  59  43 ...,  86 102 104]
 [ 66  65  48 ...,  86 103 105]]```

      

Here's a gist demonstrating the behavior just mentioned. It includes the complete code for how I process the image.

So my main questions are:

  • Why is the output of cv2.imread () and tf.image.decode_jpeg () different?
  • How are cv2.resize () and tf.image.resize_images () different if they use the same interpolation scheme?

Thank!

+3


source to share


1 answer


As vijay m points out correctly, changing dct_method

to "INTEGER_ACCURATE" will give you the same uint8 image using cv2 or tf. The problem really is the resizing method. I also tried to get Tensorflow to use the same interpolation method as the default cv2 (bilinear), but the results are still different. This might be the case because cv2 does integer interpolation, and TensorFlow converts to float before interpolation. But that's just a guess. If you plot the pixel difference between the resized image with TF and cv2, you get the following story:

Pixel Difference Histogram

As you can see, this looks pretty normal. (Also I was surprised at the amount of pixel difference). The problem with your falling accuracy could be here. In this article, Goodfellow et al. describe the impact of adversarial examples and classification systems. The problem here seems to be something similar. If the original weights you are using for your network were trained using some kind of input pipeline that produces the results of the cv2 functions, the image from the TF input pipeline is something of an adversarial example.



(see image on page 3 above for example ... I can't post more than two links.)

So in the end, I think that if you want to use the original network weights for the same data they trained the networks, you should be left with a similar input pipeline. If you're using the scale to fine-tune the network from your own data, this shouldn't be much of a concern because you are resetting the classification level to handle new input images (from the TF pipeline).

And @ Ishant Mrinal . Please see the code provided in GIST. It knows the difference between BGR (cv2) and RGB (TF) and converts images to the same color space.

+2


source







All Articles