Explore

Explaining Softmax

The intuition behind Softmax in the context of Deep Learning

One of the concepts early on in deep learning is that of softmax. Softmax is a "soft approximation" of argmax. Argmax tells you which item in your data is the largest, and it's one hot notation spits out a vector with a "1" for the largest item. The softmax function maps real numbers onto a range (0,1), and ensures that they add up to 1, thus allowing for probabilistic interpretations. A lot depends on the β you pick.

For instance, you might select β =

500

⁠

SoftmaxDemo when β = 500

SoftmaxDemo when β = 500

e^(β*Zi)

Softmax(β,Zi)

argmax(Zi) in one hot notation

0.95

194930099308405570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

0.00%

0.99

94573299722212420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

99.33%

0.98

637229881056891500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

0.67%

There are no rows in this table

0.99

Max

95210529798199420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Sum

100.00%

Sum

⁠

The sum of the exponents is

95210529798199420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

⁠

and the max is

0.99

⁠

For instance, you might select β =

1

⁠

SoftmaxDemo when β = 1

SoftmaxDemo when β = 1

e^(β*Zi) 3

Softmax(β,Zi) 3

argmax(Zi) in one hot notation

0.95

2.59

32.56%

0.99

2.69

33.89%

0.98

2.66

33.55%

There are no rows in this table

0.99

Max

7.94

Sum

100.00%

Sum

⁠

The sum of the exponents is

7.94

⁠

and the max is

0.99

⁠

Go ahead and change the Zi values in either of the tables above and see how the spread changes. As you can see, a larger β accentuates small differences and makes you overconfident.

The softmax feeds into loss calculations that are defined as the negative logarithm of softmax. To understand the spread of loss, see the chart below:

Chart of Loss Table

Chart of Loss Table

⁠

As you can see above, the loss tends to infinity when softmax tends to zero. This is basically a big ding for being absolutely sure about the wrong answer. Conversely, if you are absolutely sure about the right answer, the loss is zero. Libraries like Pytorch give you easy access to Softmax, but it is just as important to understand the math and the intuition behind this approach.

Loss Table

Loss Table

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.