Gallery
Concise and Practical AI/ML
Share
Explore
Neuralnet

Backpropagation

Backpropagation is different from feedforward, it is dynamic programming with optimisation inside while feedforward is only calculating next layer from previous layer without optimisation.

The Sample Network

The same network in is used in describing this backpropagation process.

Diagram

image.png

Backprop Start

Final loss value, e = fe(u1,u2,y1,y2), fe is the loss function, te is the derivative of loss function.
image.png

Gradient of Loss Function

The loss gradient function is:
image.png

Gradient of Loss Function wrt Each Weight

The activation of each neuron is f and its derivative is t. Variables of neurons other than the neuron making respect to will be removed from gradient calculations; the removes are usually unrelated variables or constants those make zero derivative. The 4 neurons are called N1, N2, N3, N4.
Weights we1, we2, at loss node connected from output nodes are always 1.

Gradient for Weight w5

image.png

Gradient for Weight w6

image.png
(Fix: This w6 goes with u1-y1, not u2-y2)

Backprop Formula

Backprop algorithm goes thru’ a dynamic programming process with a dynamic programming formula.
With the gradient calculation above. Adding the weight at loss node for the connection from N3, as in auto-differentiation concept, every node in graph should behave the same and always having weights of edges connecting in. Value of this kind of weight is constant 1 always because value 1 doesn't change anything multiplying thru'.
image.png
The above formula for gradient of the mentioned weight can be re-written as:
image.png
The left part of the right side actually forms the dynamic programming intermediate value v after trying to do manual derivations further to see the similarity of gradient formulas for weights:
image.png
The v value at loss node has different values for different output neurons. Only at that loss node there are multiple v values, at all neurons inside there's only 1 v value per neuron:
image.png
Further manual derivations of other weights would lead to the below dynamic programming formula for backprop. The feed direction is from left to right. The backpropagation process is from right to left back.
image.png
In order to get gradient for a weight, multiply v of the neuron containing that weight with input (x or h, depending on layer, all called x in this gw below) of that weight:
image.png
Bias has the input of constant 1 always, thus gradient for bias is just:
image.png

Update Weights and Biases

Use the gradients calculated above to update weights and biases.
w = w - r*g
b = b - r*g
Due to the habit of using minus sign in the weight update formula, the delta (subtraction between u and y) must be u-y and NOT y-u because u is supposed to offshoot (higher value) than the true value y so during weight update it should be minus.

Feedforward & Backprop Similarities

Dynamic programming
Feedforward is preparation part (no optimisation) of backprop dynamic programming (with optimisation).
Dynamic programming formulas
Very similar, both have sum of values times weights, bias has no use during backprop as it is removed due to being unrelated to the weight having gradient calculated.
Dot-product: d = sum(xw) + b
Feedforward: h = f(d)
Backprop: v = sum(vw) * t(d)

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.