[New] Concise and Practical AI/ML

Pages
- Preface
- Artificial Intelligence
  Concepts
  High-level Intelligence
- Maths for ML
  Calculus
  Algebra
- Machine Learning
  History of ML
  ML Models
  ML Model is Better
  How a Model Learns
  Boosted vs Combinatory
  Neuralnet
  Neuron
  Types of Neurons
  Layers
  Neuralnet Alphabet
  Heuristic Hyperparams
  Feedforward
  Input Separation
  Backprop
  Activation Functions
  Loss Functions
  Gradient Descent
  Optimizers
  Design Techniques
  Normalization
  Regularization
  Drop-out Technique
  Concatenation
  Overfitting & Underfitting
  Explosion & Vanishing
  Engineering Techniques
- Methods of ML
  Supervised Learning
  Regression
  Classification
  Reinforcement Learning
  Concepts
  Bellman Equation
  Q-table
  Q-network
  Learning Tactics
  Policy Network
  Unsupervised Learning
  Some Applications
  Other Methods
- Practical Cases
- Ref & Glossary

[New] Concise and Practical AI/ML

...

Learning Tactics

Explore

Learning Tactics

Policy Network

Monte Carlo Search

Randomise and find some good samples. Do Monte Carlo for the best value instead of the ratio of number of points in all points.

Bellman Update

Formula

td = [rw1 + dc*q2max] - q1

q += rr * td

Target q for q-network: q = q + rr * td

Variations

SARSA: Use existing policy, just q2, no max.

Double Q: Qtarget and Qonline

State Only: Use V(s) instead of Q(s,a), no action concerning.

Explore and Exploit

Explore and exploit is the mainstream strategy in reinforcement learning.

Explore

Consider out there in the wild of unknown cases, there can be better options for action to take. Make a random action.

Explore Rate

Exploration has a rate that reduces in time (the max time, max epochs intended for training) to make the model converge. However, in incremental learning, this explore rate usually never reduces to zero and leave a little bit of exploration.

Start with explore_rate as 1, which is 100% exploration, reduce exploration rate in training to emphasize exploited results.

Explore rate picking tactic 1:

// This is not good with explore_rate==0 at training end

// coz q-network may stick to failed exploited result

to_explore = rand() < explore_rate

Explore rate picking tactic 2:

// This is not good coz 10% of training at start will be always exploring

to_explore = rand() < explore_rate + .1

Explore rate picking tactic 3 (Simple and optimal):

// Good point 1: Starts exploiting from start

// Good point 2: Still with 10% of exploring at training end to switch to

// the way to global optimum if accidentally heading to local optimum.

to_explore = rand() < max(explore_rate, .1)

Exploit

Use the known q-function or q-table, q-network to find the max value with the learnt cases to take action.

When to Train

RL q-network can be trained after every action or every run over episode, but doing so after every action is too costly for computation. Training the q-network after every run (of a full episode) to the end point is much faster training.

Single Agent System

Multi-agent System

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.