What is the "gate_gradients" attribute in the minimorflow minim () function in the optimzer class?

+3


source to share


2 answers


On the same page you linked to, if you scroll down a bit, it says:



gate_gradients argument, which controls the degree of parallelism when applying gradients

+2


source


GATE_NONE: Take a simple case of matte op on two vectors x and y. let the output be L. Now the gradient of L wrt x is equal to y and the gradient of L wrt y is equal to xT (x is transposed). with GATE_NONE it can happen that the gradient wrt x is applied to change x before the gradient for y is even calculated. Now when the gradient is computed in y, it will be computed equal to the modified x, which is an error. Of course, this will not happen in such a simple case, but you could imagine that it could happen in more complex / extreme cases.

GATE_OP: For each Op, make sure all gradients are computed before using them. This prevents race conditions for Ops that generate gradients for multiple inputs, where the gradients depend on the inputs. (You could see how this prevents the GATE_NONE problem, albeit at the cost of some parallelism).



GATE_GRAPH: Make sure all gradients for all variables are computed before using one of them. This provides the smallest parallelism value, but can be useful if you want to process all gradients before applying any of them (example use - cut gradients according to the global norm before applying them)

+5


source







All Articles