Formula
Q += Rate x TD
Where Q is expected future return, Rate is learning rate, TD is temporal difference.
Important Things vs SL
In supervised learning there are
Delta is (U-Y), and Loss calculated from delta In reinforcement learning, the concepts are
Diff (temporal diff), and Goal = sum(abs(diff)) of episode Goal should reach max sum reward of episode abs(diff) or diff^2 should not be called error or loss, ‘cause it’s increasing, not to reduce. Training
Logging
In reinforcement learning, the trainer programme should log out not the loss of q-network as it is no use, should log out the sum of rewards at episode run end. Logging out the Q(S,A) at square 1 (start point of episode) may be what people think of but it is increasing always, also no use.
Q-* Related Terms
There are some related terms with q prefix in Reinforcement Learning based on Bellman:
q-learning, q-function, q-table, q-network, q-value.
Definitions:
• Q-learning is the learning method utilising Bellman equation.
• Q-function is used by Q-learning, giving q-value
• Q-table or Q-network are 2 common kinds of Q-function
• Q-network (Deep Q-Network) is the practical replacement for q-table
• Q-value is the returning result of q-function, q-table, q-network