Skip to content
[New] Concise and Practical AI/ML
  • Pages
    • Preface
    • Artificial Intelligence
      • Concepts
      • High-level Intelligence
    • Maths for ML
      • Calculus
      • Algebra
    • Machine Learning
      • History of ML
      • ML Models
        • ML Model is Better
        • How a Model Learns
        • Boosted vs Combinatory
      • Neuralnet
        • Neuron
          • Types of Neurons
        • Layers
        • Neuralnet Alphabet
        • Heuristic Hyperparams
      • Feedforward
        • Input Separation
      • Backprop
        • Activation Functions
        • Loss Functions
        • Gradient Descent
        • Optimizers
      • Design Techniques
        • Normalization
        • Regularization
          • Drop-out Technique
        • Concatenation
        • Overfitting & Underfitting
        • Explosion & Vanishing
      • Engineering Techniques
    • Methods of ML
      • Supervised Learning
        • Regression
        • Classification
      • Reinforcement Learning
        • Concepts
        • icon picker
          Bellman Equation
        • Q-table
        • Q-network
        • Learning Tactics
          • Policy Network
      • Unsupervised Learning
        • Some Applications
      • Other Methods
    • Practical Cases
    • Ref & Glossary

Bellman Equation

Bellman equation is the core thing in reinforcement learning which is used to optimise the model. Bellman equation is of hard rock solid importance just as the weight update formula in supervised learning to update weights.

Bellman Equation

The equation is (this is a chain equation, Snext will contain Snextnext and so on): ​
image.png
where V is value of state, S is state, Q is q-function, A is action, D is discount.

Q-learning

Q-learning works using q-function.
Q-learning is the process of reinforcement learning based on q-function. Q-learning can run based on Q-table or Q-network (deep Q-network is the same). The process is based on the following optimisation assignment; the following assignment is the core of Q-learning just as the weight update assignment in supervised learning.

Important Notes ⚠️

The Q in this book has 2 different presentation, when it goes with parentheses (...) it is q-function or q-network call; when it goes with square brackets [...] it is q-table cell. And R(S,A) is the result of Q(S,A).

The Assignment in Wikipedia

The assignment formula is: ​
image.png

The Assignment as Self-update

Q[S,A] can be moved inside the right part of the right side of the assignment, and is: ​
image.png
It is easy to remember that this self-update assignment is += instead of -= because reinforcement learning is about maximising rewards while supervised learning is about minimising loss. B is learning rate as also r in supervised learning.
The part which takes place of the gradient in supervised learning is: ​
image.png

Temporal Difference

The part in the self-update assignment above which said to be the same part as supervised learning gradient is called Temporal Difference.
Concept:
M = [Rₙₒᵥᵥ + D*Qₘₐₓₙₑₓₜ] - [Qₙₒᵥᵥ]
Details: ​
image.png
where the left part of the minus is the
Reward of current action + Max possible future rewards (from current, not from step 0)
and the right part of the minus is the
Current value of q-table or q-network output.

Final Self-update Assignment Formula

Re-mention about supervised learning weight update, where w is weight, r is learning rate, g is gradient:
w = w - r*g
Reinforcement learning q-value update, where Q is q-value, B is learning rate, M is temporal difference:
Q = Q + B*M

Choices for Q-learning

Q-learning can run on Q-table or Q-network (aka Deep Q-network).

Q-table

Q-table is the classic solution which utilises huge amount of computer memory for large problem can be considered as non-practical in real use cases.

Q-network

Q-network doesn’t take that huge RAM, it learns based on the combination of neurons and layers; the total number of combinations can be exponentially large thus Q-network is very effective.

 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.