TensorFlow: Sudden loss of precision after freezing a graph?

Question

TensorFlow: Sudden loss of precision after freezing a graph?

Is it common to see a dramatic loss of accuracy after a schedule freeze for maintenance? While training and evaluating the color dataset using the pre-trained initial-resnet-v2, my accuracy is 98-99%, with a 90 +% probability for correct predictions. However, after freezing my graph and predicting it again, my model was not as accurate and the correct labels were predicted with 30-40% confidence.

After training the model, I had several items:

Checkpoint file
model.ckpt.index file
file model.ckpt.meta li>
model.ckpt file
file graph.pbtxt.

Since I was unable to run the official freeze plot file located in the tensorflow repository on GitHub (I think it was because I have a pbtxt, not a pb file after my training), I am reusing the code in this tutorial .

Here's the code I modified to freeze my graph:

import os, argparse

import tensorflow as tf
from tensorflow.python.framework import graph_util

dir = os.path.dirname(os.path.realpath(__file__))

def freeze_graph(model_folder, input_checkpoint):
    # We retrieve our checkpoint fullpath
    checkpoint = tf.train.get_checkpoint_state(model_folder)
    # input_checkpoint = checkpoint.model_checkpoint_path

    # We precise the file fullname of our freezed graph
    absolute_model_folder = "/".join(input_checkpoint.split('/')[:-1])
    output_graph = absolute_model_folder + "/frozen_model.pb"

    # Before exporting our graph, we need to precise what is our output node
    # This is how TF decides what part of the Graph he has to keep and what part it can dump
    # NOTE: this variable is plural, because you can have multiple output nodes
    output_node_names = "InceptionResnetV2/Logits/Predictions"

    # We clear devices to allow TensorFlow to control on which device it will load operations
    clear_devices = True

    # We import the meta graph and retrieve a Saver
    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)

    # We retrieve the protobuf graph definition
    graph = tf.get_default_graph()
    input_graph_def = graph.as_graph_def()

    # We start a session and restore the graph weights
    with tf.Session() as sess:
        saver.restore(sess, input_checkpoint)

        # We use a built-in TF helper to export variables to constants
        output_graph_def = graph_util.convert_variables_to_constants(
            sess, # The session is used to retrieve the weights
            input_graph_def, # The graph_def is used to retrieve the nodes 
            output_node_names.split(",") # The output node names are used to select the usefull nodes
        ) 

        # Finally we serialize and dump the output graph to the filesystem
        with tf.gfile.GFile(output_graph, "wb") as f:
            f.write(output_graph_def.SerializeToString())
        print("%d ops in the final graph." % len(output_graph_def.node))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_folder", type=str, help="Model folder to export")
    parser.add_argument("--input_checkpoint", type = str, help = "Input checkpoint name")
    args = parser.parse_args()

    freeze_graph(args.model_folder, args.input_checkpoint)

This is the code I am using to run my prediction where I only load one image intended for the user:

import tensorflow as tf
from scipy.misc import imread, imresize
import numpy as np

img = imread("./dandelion.jpg")
img = imresize(img, (299,299,3))
img = img.astype(np.float32)
img = np.expand_dims(img, 0)

labels_dict = {0:'daisy', 1:'dandelion',2:'roses', 3:'sunflowers', 4:'tulips'}

#Define the filename of the frozen graph
graph_filename = "./frozen_model.pb"

#Create a graph def object to read the graph
with tf.gfile.GFile(graph_filename, "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

#Construct the graph and import the graph from graphdef
with tf.Graph().as_default() as graph:
    tf.import_graph_def(graph_def)

    #We define the input and output node we will feed in
    input_node = graph.get_tensor_by_name('import/batch:0')
    output_node = graph.get_tensor_by_name('import/InceptionResnetV2/Logits/Predictions:0')

    with tf.Session() as sess:
        predictions = sess.run(output_node, feed_dict = {input_node: img})
        print predictions
        label_predicted = np.argmax(predictions[0])

    print 'Predicted Flower:', labels_dict[label_predicted]
    print 'Prediction probability:', predictions[0][label_predicted]

And the result I got from running my prediction:

2017-04-11 17:38:21.722217: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-04-11 17:38:21.722608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 3.95GiB
Free memory: 3.42GiB
2017-04-11 17:38:21.722624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-11 17:38:21.722630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-11 17:38:21.722642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)
2017-04-11 17:38:22.183204: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-04-11 17:38:22.183232: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-04-11 17:38:22.184007: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0xb85a1c0 executing computations on platform Host. Devices:
2017-04-11 17:38:22.184022: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): <undefined>, <undefined>
2017-04-11 17:38:22.184140: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-04-11 17:38:22.184149: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 8 visible devices
2017-04-11 17:38:22.184610: I tensorflow/compiler/xla/service/service.cc:183] XLA service 0xb631ee0 executing computations on platform CUDA. Devices:
2017-04-11 17:38:22.184620: I tensorflow/compiler/xla/service/service.cc:191]   StreamExecutor device (0): GeForce GTX 860M, Compute Capability 5.0
[[ 0.1670652   0.46482906  0.12899996  0.12481128  0.11429448]]
Predicted Flower: dandelion
Prediction probability: 0.464829

Potential source of the problem: I first trained my model with TF 0.12, but I believe it is compatible with Tf 1.01, the version I am currently using. As a precaution, I upgraded my files to TF 1.01 and retrained the model to get new sets of checkpoint files (with the same precision) and then used those checkpoint files to freeze. I have compiled my tensorflow from source. Is the problem related to the fact that I am using a pbtxt file instead of a pb file? I have no idea how I can get the pb file from training my model.

+3

python deep-learning tensorflow tensorflow-serving

kwotsin 11 Apr '17 at 9:29

source to share

2 answers

Wesam na · Answer 1 · 2017-08-30T11:09:59+0000

I believe the problem is not related to freezing the model. Instead, it has to do with how you preprocess your image.

I recommend using the default preprocessing feature in InceptionResnet V2.

Below, I'll post a code that takes an image path (JPG or PNG) and returns preprocessed images. You can change it to receive a batch of images. This is not professional code. This requires some optimization. However, it works well.

First load the image:

def load_img(path_img):
    """
    Load an image to tensorflow
    :param path_img: image path on the disk
    :return: 3D tensorflow image
    """
    filename_queue = tf.train.string_input_producer([path_img])  # list of files to read

    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)

    my_img = tf.image.decode_image(value)  # use png or jpg decoder based on your files.

    init_op = tf.global_variables_initializer()
    with tf.Session() as sess:
        sess.run(init_op)

        # Start populating the filename queue.

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        for i in range(1):  # length of your filename list
            image = my_img.eval()  # here is your image Tensor :)

        print(image.shape)
        # Image.fromarray(np.asarray(image)).show()

        coord.request_stop()
        coord.join(threads)

        return image

Then the preprocessing code:

def preprocess(image, height, width,
               central_fraction=0.875, scope=None):
    """Prepare one image for evaluation.

    If height and width are specified it would output an image with that size by
    applying resize_bilinear.

    If central_fraction is specified it would cropt the central fraction of the
    input image.

    Args:
      image: 3-D Tensor of image. If dtype is tf.float32 then the range should be
        [0, 1], otherwise it would converted to tf.float32 assuming that the range
        is [0, MAX], where MAX is largest positive representable number for
        int(8/16/32) data type (see `tf.image.convert_image_dtype` for details)
      height: integer
      width: integer
      central_fraction: Optional Float, fraction of the image to crop.
      scope: Optional scope for name_scope.
    Returns:
      3-D float Tensor of prepared image.
    """

    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    # Crop the central region of the image with an area containing 87.5% of
    # the original image.
    if central_fraction:
        image = tf.image.central_crop(image, central_fraction=central_fraction)

    if height and width:
        # Resize the image to the specified height and width.
        image = tf.expand_dims(image, 0)
        image = tf.image.resize_bilinear(image, [height, width],
                                         align_corners=False)
        image = tf.squeeze(image, [0])
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image

Finally, for my case, I had to convert the processed tensor to a numpy array:

image = tf.Session().run(image)

So this image can be loaded into a frozen model

persistent_sess = tf.Session(graph=graph)  # , config=sess_config)

    input_node = graph.get_tensor_by_name('prefix/batch:0')
    output_node = graph.get_tensor_by_name('prefix/InceptionResnetV2/Logits/Predictions:0')

    predictions = persistent_sess.run(output_node, feed_dict={input_node: [image]})
    print(predictions)
    label_predicted = np.argmax(predictions[0])
    print(label_predicted)

yashar · Answer 2 · 2017-11-09T11:18:26+0000

I had a similar problem and the accuracy was 1.5% lower when using a frozen model. The problem was that the guardian object in the code was blocking the model. As an argument, you need to go to the average decay in the splash screen. I am using the code from the original model and this is how I create the splash screen on freeze script:

variable_averages = tf.train.ExponentialMovingAverage(0.9997)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)

It solved the problem for me.

TensorFlow: Sudden loss of precision after freezing a graph?

More articles: