Differences between CV2 image processing and tf.image processing
I recently disabled the use of cv2 for the Tensorflow tf.image module for image processing. However, my validation accuracy dropped by about 10%.
I believe the problem is with
- cv2.imread () vs. tf.image.decode_jpeg ()
- cv2.resize () vs. tf.image.resize_images ()
While these differences result in poorer accuracy, images appear indistinguishable to humans when using plt.imshow (). For example, take Image # 1 from ImageNet's validation dataset:
First edition:
- cv2.imread () takes a string and outputs a 3 channel uint8 BGR matrix
- tf.image_decode_jpeg () takes a string tensor and yields a uint8 RGB tensor with 3 channels.
However, after converting the tf tensor to BGR format, there are very minor differences in many pixels in the image.
Using tf.image.decode_jpeg and then converting to BGR
[[ 26 41 24 ..., 57 48 46]
[ 36 39 36 ..., 24 24 29]
[ 41 26 34 ..., 11 17 27]
...,
[ 71 67 61 ..., 106 105 100]
[ 66 63 59 ..., 106 105 101]
[ 64 66 58 ..., 106 105 101]]```
Using cv.imread
[[ 26 42 24 ..., 57 48 48]
[ 38 40 38 ..., 26 27 31]
[ 41 28 36 ..., 14 20 31]
...,
[ 72 67 60 ..., 108 105 102]
[ 65 63 58 ..., 107 107 103]
[ 65 67 60 ..., 108 106 102]]```
Second problem:
- tf.image.resize_images () automatically converts the uint8 tensor to the float32 tensor and seems to exacerbate the differences in pixel values.
- I believe tf.image.resize_images () and cv2.resize () are
tf.image.resize_images
[[ 26. 25.41850281 35.73127747 ..., 81.85855103
59.45834351 49.82373047]
[ 38.33480072 32.90485001 50.90826797 ..., 86.28446198
74.88543701 20.16353798]
[ 51.27312469 26.86172867 39.52401352 ..., 66.86851501
81.12111664 33.37636185]
...,
[ 70.59472656 75.78851318
45.48100662 ..., 70.18637085
88.56777191 97.19295502]
[ 70.66964722 59.77249908 48.16699219 ..., 74.25527954
97.58244324 105.20263672]
[ 64.93395996 59.72298431 55.17600632 ..., 77.28720856
98.95108032 105.20263672]]```
cv2.resize
[[ 36 30 34 ..., 102 59 43]
[ 35 28 51 ..., 85 61 26]
[ 28 39 50 ..., 59 62 52]
...,
[ 75 67 34 ..., 74 98 101]
[ 67 59 43 ..., 86 102 104]
[ 66 65 48 ..., 86 103 105]]```
Here's a gist demonstrating the behavior just mentioned. It includes the complete code for how I process the image.
So my main questions are:
- Why is the output of cv2.imread () and tf.image.decode_jpeg () different?
- How are cv2.resize () and tf.image.resize_images () different if they use the same interpolation scheme?
Thank!
source to share
As vijay m points out correctly, changing dct_method
to "INTEGER_ACCURATE" will give you the same uint8 image using cv2 or tf. The problem really is the resizing method. I also tried to get Tensorflow to use the same interpolation method as the default cv2 (bilinear), but the results are still different. This might be the case because cv2 does integer interpolation, and TensorFlow converts to float before interpolation. But that's just a guess. If you plot the pixel difference between the resized image with TF and cv2, you get the following story:
As you can see, this looks pretty normal. (Also I was surprised at the amount of pixel difference). The problem with your falling accuracy could be here. In this article, Goodfellow et al. describe the impact of adversarial examples and classification systems. The problem here seems to be something similar. If the original weights you are using for your network were trained using some kind of input pipeline that produces the results of the cv2 functions, the image from the TF input pipeline is something of an adversarial example.
(see image on page 3 above for example ... I can't post more than two links.)
So in the end, I think that if you want to use the original network weights for the same data they trained the networks, you should be left with a similar input pipeline. If you're using the scale to fine-tune the network from your own data, this shouldn't be much of a concern because you are resetting the classification level to handle new input images (from the TF pipeline).
And @ Ishant Mrinal . Please see the code provided in GIST. It knows the difference between BGR (cv2) and RGB (TF) and converts images to the same color space.
source to share