[New] Concise and Practical AI/ML

Pages
- Preface
- Artificial Intelligence
  Concepts
  High-level Intelligence
- Maths for ML
  Calculus
  Algebra
- Machine Learning
  History of ML
  ML Models
  ML Model is Better
  How a Model Learns
  Boosted vs Combinatory
  Neuralnet
  Neuron
  Types of Neurons
  Layers
  Neuralnet Alphabet
  Heuristic Hyperparams
  Feedforward
  Input Separation
  Backprop
  Activation Functions
  Loss Functions
  Gradient Descent
  Optimizers
  Design Techniques
  Normalization
  Regularization
  Drop-out Technique
  Concatenation
  Overfitting & Underfitting
  Explosion & Vanishing
  Engineering Techniques
- Methods of ML
  Supervised Learning
  Regression
  Classification
  Reinforcement Learning
  Concepts
  Bellman Equation
  Q-table
  Q-network
  Learning Tactics
  Policy Network
  Unsupervised Learning
  Some Applications
  Other Methods
- Practical Cases
- Ref & Glossary

[New] Concise and Practical AI/ML

...

Optimizers

Explore

Optimizers

Training

Data Split

Data should have enough for a lot of cases, and even a lot of cases with different input but similar output. When having those enough, training data is to be split to Training Set, and Validation Set (Test Set).

Optimisers

SGD

Stochastic Gradient Descent. Basic gradient-based optimiser, doesn’t converge fast in production cases.

SGD uses the basic weight update assignment:

w = w - r*g

Momentum

Rather good optimiser but still not the best in production, Adam can converge faster and more adaptable to use cases.

Momentum has extra coefficient in weight update formula.

Adam

Adaptive Momentum. The best and common optimiser in production, converge fast on multiple kinds of problems.

Adam has more extra coefficient in weight update formula.

Training Process

Data Preparation

Split data into training set and test test.

Training

Train the network for multiple epochs.

Inference

Use training data, test data, and unknown data to test the model.

Hardware Utilisation

Training can be done on CPU (slow), or GPU (fast). Training on GPU is fast because GPUs are designed for matrix calculation with thousands of cores (simple cores) instead of some cores (complex core) in CPU.