[New] Concise and Practical AI/ML

Pages
- Preface
- Artificial Intelligence
  Concepts
  High-level Intelligence
- Maths for ML
  Calculus
  Algebra
- Machine Learning
  History of ML
  ML Models
  ML Model is Better
  How a Model Learns
  Boosted vs Combinatory
  Neuralnet
  Neuron
  Types of Neurons
  Layers
  Neuralnet Alphabet
  Heuristic Hyperparams
  Feedforward
  Input Separation
  Backprop
  Activation Functions
  Loss Functions
  Gradient Descent
  Optimizers
  Design Techniques
  Normalization
  Regularization
  Drop-out Technique
  Concatenation
  Overfitting & Underfitting
  Explosion & Vanishing
  Engineering Techniques
- Methods of ML
  Supervised Learning
  Regression
  Classification
  Reinforcement Learning
  Concepts
  Bellman Equation
  Q-table
  Q-network
  Learning Tactics
  Policy Network
  Unsupervised Learning
  Some Applications
  Other Methods
- Practical Cases
- Ref & Glossary

[New] Concise and Practical AI/ML

...

Q-network

Explore

Q-network

Code Files

⁠

New Pacman 3x3 in TensorFlow (Python) [WIP]⁠

⁠

Demo of Pacman 3x3 in TensorFlow.js⁠

⁠

The Network

Q-network returns the q-value just as by a q-function. It returns the q-value instead of the action to do. Another name of Q-network is DQN (Deep Q-Network) but it’s just Q-network, deep is of course in multiple layers.

Q-network

Q-network is a type of q-function, just as q-table is also a type of q-function. Q-network is not the magic to learn in RL itself, q-network is just the replacement for q-table with much higher storage effeciency with tiny size and stores all cases.

The learning process is still the explore and exploit, and Bellman’s q-target formula. Larger q-network is actually is for storing large possibilities of input, eg. 10x10 black-and-white input is 2^100 cases of environment state. Larger q-network doesn’t mean better learning for more rewards.

Q-learning on Q-network

Based on the same q-value update formula as in q-table:

⁠

For each update:

Feed to the current q-network to get current q-value.

Train the q-network to the new q-value.

Q-network Init

Don’t init params (weights, biases) to all zeros as in Q-table

Q-network Update

The update formulas for q-table are:

Formula 1 (temporal diff): td = qTarget - qNow = (rewardNow + discount*qNextMax) - qNow

Formula 2 (q-update): q += rr * td (Bellman Update)

Intuitively, one may think q-network replaces the q-table exactly in the role of q-function. Thus, input and the target q (expected value) for training q-network will be:

Input (StatePair & Action) → Output (rr * td)

But it’s not that way because q-network has optimizer, and optimizer already indirectly knows qNow(network result), just supplying qTarget to it is enough, and it has learning rate by itself too:

qTarget = rewardNow + discount*qNextMax

rr = learningRateOfOptimizer

Applying Bellman update q in place of qTarget will make q-network broken, coz both of these are applied:

Bellman update (algorithmic optimization)

Optimizer update (gradient-based)

When to Train

Unlike q-value being updated after every action as in q-table. Q-network takes much much less RAM but the call to get output is slow compared to constant time to getting q-value from table, and the fit (training) is extremely slow compare to setting q-value in q-table.

There are 2 options of when to train the q-network:

• Train after an action

• Train after a run through the whole episode

Train after an action as in q-table shouldn't be a choice, it's very slow training, unless having enough hardware resources. Train after a whole single run is better, faster for training.

Code Files

The Network

Q-network

Q-learning on Q-network

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.