Skip to content
[New] Concise and Practical AI/ML
  • Pages
    • Preface
    • Artificial Intelligence
      • Concepts
      • High-level Intelligence
    • Maths for ML
      • Calculus
      • Algebra
    • Machine Learning
      • History of ML
      • ML Models
        • ML Model is Better
        • How a Model Learns
        • Boosted vs Combinatory
      • Neuralnet
        • Neuron
          • Types of Neurons
        • Layers
        • Neuralnet Alphabet
        • Heuristic Hyperparams
      • Feedforward
        • Input Separation
      • Backprop
        • Activation Functions
        • Loss Functions
        • Gradient Descent
        • Optimizers
      • Design Techniques
        • Normalization
        • Regularization
          • Drop-out Technique
        • Concatenation
        • Overfitting & Underfitting
        • Explosion & Vanishing
      • Engineering Techniques
    • Methods of ML
      • Supervised Learning
        • Regression
        • Classification
      • Reinforcement Learning
        • Concepts
        • Bellman Equation
        • Q-table
        • Q-network
        • icon picker
          Learning Tactics
          • Policy Network
      • Unsupervised Learning
        • Some Applications
      • Other Methods
    • Practical Cases
    • Ref & Glossary

Learning Tactics

Monte Carlo Search

Randomise and find some good samples. Do Monte Carlo for the best value instead of the ratio of number of points in all points.

Bellman Update

Formula

td = [rw1 + dc*q2max] - q1
q += rr * td
Target q for q-network: q = q + rr * td

Variations

SARSA: Use existing policy, just q2, no max.
Double Q: Qtarget and Qonline
State Only: Use V(s) instead of Q(s,a), no action concerning.

Explore and Exploit

Explore and exploit is the mainstream strategy in reinforcement learning.

Explore

Consider out there in the wild of unknown cases, there can be better options for action to take. Make a random action.

Explore Rate

Exploration has a rate that reduces in time (the max time, max epochs intended for training) to make the model converge. However, in incremental learning, this explore rate usually never reduces to zero and leave a little bit of exploration.
Start with explore_rate as 1, which is 100% exploration, reduce exploration rate in training to emphasize exploited results.
Explore rate picking tactic 1:
// This is not good with explore_rate==0 at training end
// coz q-network may stick to failed exploited result
to_explore = rand() < explore_rate
Explore rate picking tactic 2:
// This is not good coz 10% of training at start will be always exploring
to_explore = rand() < explore_rate + .1
Explore rate picking tactic 3 (Simple and optimal):
// Good point 1: Starts exploiting from start
// Good point 2: Still with 10% of exploring at training end to switch to
// the way to global optimum if accidentally heading to local optimum.
to_explore = rand() < max(explore_rate, .1)

Exploit

Use the known q-function or q-table, q-network to find the max value with the learnt cases to take action.

When to Train

RL q-network can be trained after every action or every run over episode, but doing so after every action is too costly for computation. Training the q-network after every run (of a full episode) to the end point is much faster training.

Single Agent System

Multi-agent System


 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.