Fixed bug with distorted Softmax tone
In a classification problem with many classes, tensor papers suggest using sampled_softmax_loss via a simple softmax to shorten the training execution time.
As per the docs and source (line 1180), the calling pattern for sampled_softmax_loss is:
tf.nn.sampled_softmax_loss(weights, # Shape (num_classes, dim) - floatXX
biases, # Shape (num_classes) - floatXX
labels, # Shape (batch_size, num_true) - int64
inputs, # Shape (batch_size, dim) - floatXX
num_sampled, # - int
num_classes, # - int
num_true=1,
sampled_values=None,
remove_accidental_hits=True,
partition_strategy="mod",
name="sampled_softmax_loss")
It is not clear (at least to me) how to transform the real world problem into the forms required by this loss function. I think the "inputs" field is the problem.
Below is a minimal working copy-pair example that causes a matrix multiplication form error when calling the loss function.
import tensorflow as tf
# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# Dependent & Independent Variable Placeholders
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes]) #
# Weights and Biases
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Super simple model builder
def tiny_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
return out_layer
# Create the model
pred = tiny_perceptron(x, weights, biases)
# Set up loss function inputs and inspect their shapes
w = tf.transpose(weights['out'])
b = biases['out']
labels = tf.reshape(tf.argmax(y, 1), [-1,1])
inputs = pred
num_sampled = 3
num_true = 1
num_classes = n_classes
print('Shapes\n------\nw:\t%s\nb:\t%s\nlabels:\t%s\ninputs:\t%s' % (w.shape, b.shape, labels.shape, inputs.shape))
# Shapes
# ------
# w: (10, 256) # Requires (num_classes, dim) - CORRECT
# b: (10,) # Requires (num_classes) - CORRECT
# labels: (?, 1) # Requires (batch_size, num_true) - CORRECT
# inputs: (?, 10) # Requires (batch_size, dim) - Not sure
loss_function = tf.reduce_mean(tf.nn.sampled_softmax_loss(
weights=w,
biases=b,
labels=labels,
inputs=inputs,
num_sampled=num_sampled,
num_true=num_true,
num_classes=num_classes))
Line End Triggers and ValueError stating that you cannot propagate tensors with shape (?, 10) and (?, 256). As a rule, I agree with this statement. Complete error shown below:
ValueError: Dimensions must be equal, but are 10 and 256 for 'sampled_softmax_loss_2/MatMul_1' (op: 'MatMul') with input shapes: [?,10], [?,256].
If the "dim" value from the tensorflow documents is constant, either the "weights" variables or the "inputs" included in the loss function are incorrect.
Any thoughts would be amazing, I am totally obsessed with how to use this loss function correctly and it will have a huge impact on the training time for the model we are using for (500k classes). Thank!
--- --- EDIT
One can get the example shown above to run without error by playing around with the parameters and ignoring the expected inputs sampled_softmax_loss
. If you do this, it will result in a learning model that has a 0 impact on the prediction accuracy (as you would expect).
source to share
In your softmax layer, you are multiplying your network predictions that are dimensioned (num_classes,)
by your matrix w
, which is dimension (num_classes, num_hidden_1)
, so you are trying to compare your target size labels (num_classes,)
with something now size (num_hidden_1,)
. Change your tiny perceptron to output layer_1
instead and then change the definition of value. The code below can do the trick.
def tiny_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
return layer_1
layer_1 = tiny_perceptron(x, weights, biases)
loss_function = tf.reduce_mean(tf.nn.sampled_softmax_loss(
weights=weights['h1'],
biases=biases['b1'],
labels=labels,
inputs=layer_1,
num_sampled=num_sampled,
num_true=num_true,
num_classes=num_classes))
When you train your network with some optimizer, you tell it to minimize loss_function
, which should mean that it will correct both sets of weights and biases.
source to share
The key is to communicate the correct shape for the weight, offset, input and label. The form of weight passed by sampled_softmax is not the same as the general situation. For example, logits = xw + b
call sampled_softmax like this:,
sampled_softmax(weight=tf.transpose(w), bias=b, inputs=x)
NOT sampled_softmax(weight=w, bias=b, inputs=logits)
!! Also, the shortcut is not one hot view. if your labels are presented hot golabels=tf.reshape(tf.argmax(labels_one_hot, 1), [-1,1])
source to share