How to get the bias and weights of neurons in the optimizer?

In TensorFlow optimizer (python), the method apply_dense

makes a call for neuron weights (level connections) and bias weights, but I would like to use both methods in this method.

def _apply_dense(self, grad, weight):
    ...

      

For example: a fully connected neural network with two hidden layers with two neurons and an offset for each.

Neural network example

If we look at level 2, we get in the apply_dense

call for the weights of the neurons:

scales of neurons

and a call to the bias weights:

bias weights

But I would need either a one-call apply_dense

matrix or a weight matrix:

all weights from one layer

X_2X_4, B_1X_4, ... is only a designation of the weight of the connection between two neurons. Therefore, B_1X_4 is only a placeholder for the weight between B_1 and X_4.

How to do it?

MWE

For a minimal working example, here's an implementation of the momentum stochastic gradient descent optimizer. For each layer, the impulse of all incoming connections from other neurons is reduced to the average (see Ndims == 2). What I need is the average of not only the pulse values ​​from the incoming neural connections, but also from the incoming offsets (as described above).

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.python.training import optimizer


class SGDmomentum(optimizer.Optimizer):
    def __init__(self, learning_rate=0.001, mu=0.9, use_locking=False, name="SGDmomentum"):
        super(SGDmomentum, self).__init__(use_locking, name)
        self._lr = learning_rate
        self._mu = mu

        self._lr_t = None
        self._mu_t = None

    def _create_slots(self, var_list):
        for v in var_list:
            self._zeros_slot(v, "a", self._name)

    def _apply_dense(self, grad, weight):
        learning_rate_t = tf.cast(self._lr_t, weight.dtype.base_dtype)
        mu_t = tf.cast(self._mu_t, weight.dtype.base_dtype)
        momentum = self.get_slot(weight, "a")

        if momentum.get_shape().ndims == 2:  # neuron weights
            momentum_mean = tf.reduce_mean(momentum, axis=1, keep_dims=True)
        elif momentum.get_shape().ndims == 1:  # bias weights
            momentum_mean = momentum
        else:
            momentum_mean = momentum

        momentum_update = grad + (mu_t * momentum_mean)
        momentum_t = tf.assign(momentum, momentum_update, use_locking=self._use_locking)

        weight_update = learning_rate_t * momentum_t
        weight_t = tf.assign_sub(weight, weight_update, use_locking=self._use_locking)

        return tf.group(*[weight_t, momentum_t])

    def _prepare(self):
        self._lr_t = tf.convert_to_tensor(self._lr, name="learning_rate")
        self._mu_t = tf.convert_to_tensor(self._mu, name="momentum_term")

      

For a simple neural network: https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py (change optimizer to SGDmomentum custom optimizer only)

+3


source to share


1 answer


Update: I will now try to give a better answer (or at least some ideas) that I have some understanding of your goal, but as you suggest in the comments, there is probably no surefire way to do this in TensorFlow.

Since TF is a general computation scheme, there is no good way to determine what pairs of weights and biases exist in a model (or a neural network in general). Here are some possible approaches to the problem that I can think of:

  • Annotating tensors. This is probably impractical, as you already said that you have no control over the model, but a simple option is to add additional attributes to the tensors to denote the weight / bias ratios. For example, you can do something like W.bias = B

    and B.weight = W

    and then _apply_dense

    check in hasattr(weight, "bias")

    and hasattr(weight, "weight")

    (there might be some better designs in that sense).
  • You can explore some of the frameworks built on top of TensorFlow where you can have more details on the structure of the model. For example, Keras is a layer-based framework that implements its own optimizer classes (based on TensorFlow or Theano). I'm not very familiar with the code or its extensibility, but you probably have more tools to use.
  • Detect the structure of the network yourself from the optimizer. It's quite difficult, but theoretically possible. from the loss tensor passed to the optimizer, it must be possible to "ascend" in the model graph to reach all its nodes (taking .op

    tensors and .inputs

    ops). You can detect tensor multiplications and additions with variables and skip everything else (activation, calculation of losses, etc.) to determine the structure of the network; if the model does not meet your expectations (for example, there are no multiplications, or there is multiplication without further addition), you can throw an exception indicating that your optimizer cannot be used for this model.

An old answer kept for the sake of preservation.

I'm not 100% confused about what you are trying to do, so I'm not sure if this really answers your question.



Let's say you have a dense layer converting an input size of size M to an output file of size N. According to the convention you show, you will have an N Γ— MW weight matrix and an N-size offset vector B. Then an input vector X of size M (or a batch of inputs of size M Γ— K) will be processed by the layer as W Β· X + B and then the activation function will be applied (in the case of a batch, the addition will be "broadcast"). In TensorFlow:

X = ...  # Input batch of size M x K
W = ...  # Weights of size N x M
B = ...  # Biases of size N

Y = tf.matmul(W, X) + B[:, tf.newaxis]  # Output of size N x K
# Activation...

      

If you want, you can always put W and B together in one extended weight matrix W *, basically adding B as a new row in W, so W * will be (N + 1) Γ— M. Then you just need to add a new element to input vector X containing constant 1 (or newline if batch), so you get X * with size N + 1 (or (N + 1) Γ— K for batch), Then product W * X * will give you that same result as before. In TensorFlow:

X = ...  # Input batch of size M x K
W_star = ...  # Extended weights of size (N + 1) x M
# You can still have a "view" of the original W and B if you need it
W = W_star[:N]
B = W_star[-1]

X_star = tf.concat([X, tf.ones_like(X[:1])], axis=0)
Y = tf.matmul(W_star, X_star)  # Output of size N x K
# Activation...

      

You can now calculate gradients and updates for weights and biases. The downside to this approach is that if you want to apply regularization, then you have to be careful to only apply it to the weights of the matrix and not to the offsets.

+1


source







All Articles