Gallery
[New] Concise and Practical AI/ML
Share
Explore

Formula

Q += Rate x T
Where Q is expected future return, Rate is learning rate, T is temporal difference.

Training

Logging

In reinforcement learning, the trainer programme should log out not the loss of q-network as it is no use, should log out the sum of rewards at episode run end. Logging out the Q(S,A) at square 1 (start point of episode) may be what people think of but it is increasing always, also no use.

Q-* Related Terms

There are some related terms with q prefix in Reinforcement Learning based on Bellman:
q-learning, q-function, q-table, q-network, q-value.
Definitions:
• Q-learning is the learning method utilising Bellman equation.
• Q-function is used by Q-learning, giving q-value
• Q-table or Q-network are 2 common kinds of Q-function
• Q-network (Deep Q-Network) is the practical replacement for q-table
• Q-value is the returning result of q-function, q-table, q-network

Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.