Optimisers & Training

Optimisers

SGD

Stochastic Gradient Descent. Basic gradient-based optimiser, doesn’t converge fast in production cases.

SGD uses the basic weight update assignment:

w = w - r*g

Momentum

Rather good optimiser but still not the best in production, Adam can converge faster and more adaptable to use cases.

Momentum has extra coefficient in weight update formula.

Adam

Adaptive Momentum. The best and common optimiser in production, converge fast on multiple kinds of problems.

Adam has more extra coefficient in weight update formula.

Training Process

Data Preparation

Split data into training set and test test.

Training

Train the network for multiple epochs.

Inference

Use training data, test data, and unknown data to test the model.

Hardware Utilisation

Training can be done on CPU (slow), or GPU (fast). Training on GPU is fast because GPUs are designed for matrix calculation with thousands of cores (simple cores) instead of some cores (complex core) in CPU.

Distributed Training

It is essential to use distributed training in modern ML solutions to have the training process done faster.

⁠

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.