Skip to content
Concise and Practical AI/ML
  • Pages
    • Preface
    • What are AI and ML
    • Mathematics Recap
      • Calculus
      • Algebra
    • Libraries to Use
    • Models for ML
    • Methods of ML
    • Neuralnet Alphabet
    • Neuralnet
      • Neuron
        • Types of Neurons
        • Input Separations
        • Activation Functions
      • Layers in Network
      • Loss Functions
      • Gradient Descent
      • Feedforward
      • Backpropagation
      • Optimisers & Training
      • Techniques in ML
        • Normalisation
        • Regularisation
        • Concatenation
        • Boosted & Combinatory
        • Heuristic Hyperparams
      • Problems in Neuralnet
        • Overfitting
        • Explosion and Vanishing
    • Supervised Learning
      • Regression
      • Classification
    • Reinforcement Learning
      • Concepts
      • Learning Tactics
      • Policy Network
      • Bellman Equation
      • Q-table
      • Q-network
    • Unsupervised Learning
      • Some Applications
    • Incremental Learning
    • Case Studies
      • Algorithm Approximator
      • Regression
      • Classification
      • Sequence Learning
      • Pattern Learning
      • Generative
    • Notable Mentions

Optimisers & Training

Optimisers

SGD

Stochastic Gradient Descent. Basic gradient-based optimiser, doesn’t converge fast in production cases.
SGD uses the basic weight update assignment:
w = w - r*g

Momentum

Rather good optimiser but still not the best in production, Adam can converge faster and more adaptable to use cases.
Momentum has extra coefficient in weight update formula.

Adam

Adaptive Momentum. The best and common optimiser in production, converge fast on multiple kinds of problems.
Adam has more extra coefficient in weight update formula.

Training Process

Data Preparation

Split data into training set and test test.

Training

Train the network for multiple epochs.

Inference

Use training data, test data, and unknown data to test the model.

Hardware Utilisation

Training can be done on CPU (slow), or GPU (fast). Training on GPU is fast because GPUs are designed for matrix calculation with thousands of cores (simple cores) instead of some cores (complex core) in CPU.

Distributed Training

It is essential to use distributed training in modern ML solutions to have the training process done faster.

 
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.