[New] Concise and Practical AI/ML

Pages
- Preface
- Artificial Intelligence
  Concepts
  High-level Intelligence
- Maths for ML
  Calculus
  Algebra
- Machine Learning
  History of ML
  ML Models
  ML Model is Better
  How a Model Learns
  Boosted vs Combinatory
  Neuralnet
  Neuron
  Types of Neurons
  Layers
  Neuralnet Alphabet
  Heuristic Hyperparams
  Feedforward
  Input Separation
  Backprop
  Activation Functions
  Loss Functions
  Gradient Descent
  Optimizers
  Design Techniques
  Normalization
  Regularization
  Drop-out Technique
  Concatenation
  Overfitting & Underfitting
  Explosion & Vanishing
  Engineering Techniques
- Methods of ML
  Supervised Learning
  Regression
  Classification
  Reinforcement Learning
  Concepts
  Bellman Equation
  Q-table
  Q-network
  Learning Tactics
  Policy Network
  Unsupervised Learning
  Some Applications
  Other Methods
- Practical Cases
- Ref & Glossary

...

Explore

Bellman Equation

Bellman equation is the core thing in reinforcement learning which is used to optimise the model. Bellman equation is of hard rock solid importance just as the weight update formula in supervised learning to update weights.

Bellman Equation

The equation is (this is a chain equation, Snext will contain Snextnext and so on):

⁠

where V is value of state, S is state, Q is q-function, A is action, D is discount.

Q-learning

Q-learning works using q-function.

Q-learning is the process of reinforcement learning based on q-function. Q-learning can run based on Q-table or Q-network (deep Q-network is the same). The process is based on the following optimisation assignment; the following assignment is the core of Q-learning just as the weight update assignment in supervised learning.

Important Notes ⚠️

The Q in this book has 2 different presentation, when it goes with parentheses (...) it is q-function or q-network call; when it goes with square brackets [...] it is q-table cell. And R(S,A) is the result of Q(S,A).

The Assignment in Wikipedia

Ref:

https://en.wikipedia.org/wiki/Q-learning#Algorithm⁠

⁠

The assignment formula is:

⁠

The Assignment as Self-update

Q[S,A] can be moved inside the right part of the right side of the assignment, and is:

⁠

It is easy to remember that this self-update assignment is += instead of -= because reinforcement learning is about maximising rewards while supervised learning is about minimising loss. B is learning rate as also r in supervised learning.

The part which takes place of the gradient in supervised learning is:

⁠

Temporal Difference

The part in the self-update assignment above which said to be the same part as supervised learning gradient is called Temporal Difference.

Concept:

M = [Rₙₒᵥᵥ + D*Qₘₐₓₙₑₓₜ] - [Qₙₒᵥᵥ]

Details:

⁠

where the left part of the minus is the

Reward of current action + Max possible future rewards (from current, not from step 0)

and the right part of the minus is the

Current value of q-table or q-network output.

Final Self-update Assignment Formula

Re-mention about supervised learning weight update, where w is weight, r is learning rate, g is gradient:

w = w - r*g

Reinforcement learning q-value update, where Q is q-value, B is learning rate, M is temporal difference:

Q = Q + B*M

Choices for Q-learning

Q-learning can run on Q-table or Q-network (aka Deep Q-network).

Q-table

Q-table is the classic solution which utilises huge amount of computer memory for large problem can be considered as non-practical in real use cases.

Q-network

Q-network doesn’t take that huge RAM, it learns based on the combination of neurons and layers; the total number of combinations can be exponentially large thus Q-network is very effective.

Bellman Equation

Q-learning

Important Notes ⚠️

The Assignment in Wikipedia

The Assignment as Self-update

Temporal Difference

Final Self-update Assignment Formula

Choices for Q-learning

Q-table

Q-network

Gallery

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.