Share
Explore # Explaining SoftmaxExplaining Softmax

The intuition behind Softmax in the context of Deep Learning
One of the concepts early on in deep learning is that of softmax. Softmax is a "soft approximation" of argmax. Argmax tells you which item in your data is the largest, and it's one hot notation spits out a vector with a "1" for the largest item. The softmax function maps real numbers onto a range (0,1), and ensures that they add up to 1, thus allowing for probabilistic interpretations. A lot depends on the β you pick.
For instance, you might select β =
500
SoftmaxDemo when β = 500
1
Zi
e^(β*Zi)
Softmax(β,Zi)
argmax(Zi) in one hot notation
1
0.95
194930099308405570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0.00%
0
2
0.99
94573299722212420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
99.33%
1
3
0.98
637229881056891500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0.67%
0
There are no rows in this table
0.99
Max
95210529798199420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Sum
100.00%
Sum
The sum of the exponents is
95210529798199420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
and the max is
0.99
.
For instance, you might select β =
1
SoftmaxDemo when β = 1
1
Zi
e^(β*Zi) 3
Softmax(β,Zi) 3
argmax(Zi) in one hot notation
1
0.95
2.59
32.56%
0
2
0.99
2.69
33.89%
1
3
0.98
2.66
33.55%
0
There are no rows in this table
0.99
Max
7.94
Sum
100.00%
Sum
The sum of the exponents is
7.94
and the max is
0.99
.
Go ahead and change the Zi values in either of the tables above and see how the spread changes. As you can see, a larger β accentuates small differences and makes you overconfident.
The softmax feeds into loss calculations that are defined as the negative logarithm of softmax. To understand the spread of loss, see the chart below:
Chart of Loss Table
1
As you can see above, the loss tends to infinity when softmax tends to zero. This is basically a big ding for being absolutely sure about the wrong answer. Conversely, if you are absolutely sure about the right answer, the loss is zero. Libraries like Pytorch give you easy access to Softmax, but it is just as important to understand the math and the intuition behind this approach.
Loss Table
1 