Skip to content
[New] Concise and Practical AI/ML
  • Pages
    • Preface
    • Artificial Intelligence
      • Concepts
      • High-level Intelligence
    • Maths for ML
      • Calculus
      • Algebra
    • Machine Learning
      • History of ML
      • ML Models
        • ML Model is Better
        • How a Model Learns
        • Boosted vs Combinatory
      • Neuralnet
        • Neuron
          • Types of Neurons
        • Layers
        • Neuralnet Alphabet
        • Heuristic Hyperparams
      • Feedforward
        • Input Separation
      • Backprop
        • Activation Functions
        • Loss Functions
        • Gradient Descent
        • Optimizers
      • Design Techniques
        • Normalization
        • Regularization
          • Drop-out Technique
        • Concatenation
        • Overfitting & Underfitting
        • Explosion & Vanishing
      • Engineering Techniques
    • Methods of ML
      • Supervised Learning
        • Regression
        • Classification
      • Reinforcement Learning
        • Concepts
        • Bellman Equation
        • Q-table
        • icon picker
          Q-network
        • Learning Tactics
          • Policy Network
      • Unsupervised Learning
        • Some Applications
      • Other Methods
    • Practical Cases
    • Ref & Glossary

Q-network

Code Files

The Network

Q-network returns the q-value just as by a q-function. It returns the q-value instead of the action to do. Another name of Q-network is DQN (Deep Q-Network) but it’s just Q-network, deep is of course in multiple layers.

Q-network

Q-network is a type of q-function, just as q-table is also a type of q-function. Q-network is not the magic to learn in RL itself, q-network is just the replacement for q-table with much higher storage effeciency with tiny size and stores all cases.
The learning process is still the explore and exploit, and Bellman’s q-target formula. Larger q-network is actually is for storing large possibilities of input, eg. 10x10 black-and-white input is 2^100 cases of environment state. Larger q-network doesn’t mean better learning for more rewards.

Q-learning on Q-network

Based on the same q-value update formula as in q-table: ​
image.png
For each update:
Feed to the current q-network to get current q-value.
Train the q-network to the new q-value.

Q-network Init

Don’t init params (weights, biases) to all zeros as in Q-table

Q-network Update

The update formulas for q-table are:
Formula 1 (temporal diff): ​td = qTarget - qNow = (rewardNow + discount*qNextMax) - qNow
Formula 2 (q-update): q += rr * td (Bellman Update)
Intuitively, one may think q-network replaces the q-table exactly in the role of q-function. Thus, input and the target q (expected value) for training q-network will be:
Input (StatePair & Action) → Output (rr * td)
But it’s not that way because q-network has optimizer, and optimizer already indirectly knows qNow(network result), just supplying qTarget to it is enough, and it has learning rate by itself too:
qTarget = rewardNow + discount*qNextMax
rr = learningRateOfOptimizer
Applying Bellman update q in place of qTarget will make q-network broken, coz both of these are applied:
Bellman update (algorithmic optimization)
Optimizer update (gradient-based)

When to Train

Unlike q-value being updated after every action as in q-table. Q-network takes much much less RAM but the call to get output is slow compared to constant time to getting q-value from table, and the fit (training) is extremely slow compare to setting q-value in q-table.
There are 2 options of when to train the q-network:
• Train after an action
• Train after a run through the whole episode
Train after an action as in q-table shouldn't be a choice, it's very slow training, unless having enough hardware resources. Train after a whole single run is better, faster for training.
 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.