Gallery
[New] Concise and Practical AI/ML
Share
Explore
Reinforcement Learning

icon picker
Learning Tactics

Monte Carlo Search

Randomise and find some good samples. Do Monte Carlo for the best value instead of the ratio of number of points in all points.

Explore and Exploit

Explore and exploit is the mainstream strategy in reinforcement learning.

Explore

Consider out there in the wild of unknown cases, there can be better options for action to take. Make a random action.

Explore Rate

Exploration has a rate that reduces in time (the max time, max epochs intended for training) to make the model converge. However, in incremental learning, this explore rate usually never reduces to zero and leave a little bit of exploration.

Exploit

Use the known q-function or q-table, q-network to find the max value with the learnt cases to take action.

When to Train

RL q-network can be trained after every action or every run over episode, but doing so after every action is too costly for computation. Training the q-network after every run to the end point is much faster training.

Single Agent System

Multi-agent System


Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.