Skip to content
[New] Concise and Practical AI/ML
  • Pages
    • Preface
    • Artificial Intelligence
      • Concepts
      • High-level Intelligence
    • Maths for ML
      • Calculus
      • Algebra
    • Machine Learning
      • History of ML
      • ML Models
        • ML Model is Better
        • How a Model Learns
        • Boosted vs Combinatory
      • Neuralnet
        • Neuron
          • Types of Neurons
        • Layers
        • Neuralnet Alphabet
        • Heuristic Hyperparams
      • Feedforward
        • Input Separation
      • Backprop
        • Activation Functions
        • Loss Functions
        • Gradient Descent
        • icon picker
          Optimizers
      • Design Techniques
        • Normalization
        • Regularization
          • Drop-out Technique
        • Concatenation
        • Overfitting & Underfitting
        • Explosion & Vanishing
      • Engineering Techniques
    • Methods of ML
      • Supervised Learning
        • Regression
        • Classification
      • Reinforcement Learning
        • Concepts
        • Bellman Equation
        • Q-table
        • Q-network
        • Learning Tactics
          • Policy Network
      • Unsupervised Learning
        • Some Applications
      • Other Methods
    • Practical Cases
    • Ref & Glossary

Optimizers

Training

Data Split

Data should have enough for a lot of cases, and even a lot of cases with different input but similar output. When having those enough, training data is to be split to Training Set, and Validation Set (Test Set).

Optimisers

SGD

Stochastic Gradient Descent. Basic gradient-based optimiser, doesn’t converge fast in production cases.
SGD uses the basic weight update assignment:
w = w - r*g

Momentum

Rather good optimiser but still not the best in production, Adam can converge faster and more adaptable to use cases.
Momentum has extra coefficient in weight update formula.

Adam

Adaptive Momentum. The best and common optimiser in production, converge fast on multiple kinds of problems.
Adam has more extra coefficient in weight update formula.

Training Process

Data Preparation

Split data into training set and test test.

Training

Train the network for multiple epochs.

Inference

Use training data, test data, and unknown data to test the model.

Hardware Utilisation

Training can be done on CPU (slow), or GPU (fast). Training on GPU is fast because GPUs are designed for matrix calculation with thousands of cores (simple cores) instead of some cores (complex core) in CPU.

Distributed Training

It is essential to use distributed training in modern ML solutions to have the training process done faster.

 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.