[New] Concise and Practical AI/ML

Pages
- Preface
- Artificial Intelligence
  Concepts
  High-level Intelligence
- Maths for ML
  Calculus
  Algebra
- Machine Learning
  History of ML
  ML Models
  ML Model is Better
  How a Model Learns
  Boosted vs Combinatory
  Neuralnet
  Neuron
  Types of Neurons
  Layers
  Neuralnet Alphabet
  Heuristic Hyperparams
  Feedforward
  Input Separation
  Backprop
  Activation Functions
  Loss Functions
  Gradient Descent
  Optimizers
  Design Techniques
  Normalization
  Regularization
  Drop-out Technique
  Concatenation
  Overfitting & Underfitting
  Explosion & Vanishing
  Engineering Techniques
- Methods of ML
  Supervised Learning
  Regression
  Classification
  Reinforcement Learning
  Concepts
  Bellman Equation
  Q-table
  Q-network
  Learning Tactics
  Policy Network
  Unsupervised Learning
  Some Applications
  Other Methods
- Practical Cases
- Ref & Glossary

[New] Concise and Practical AI/ML

...

Gradient Descent

Explore

Gradient Descent

⁠

The gradient descent method (GD) of finding minimum of a function was invented by Cauchy a French mathematician. It is based on gradient to find the min point (local minimum only, unsure about global minimum) by trying to reduce the gradient to zero.

GD Diagram

As seen above.

GD Diagram Description

Consider a function fe (in ML, this function to find local minimum will be called loss function), which is concave downward; and an arbitrary coord w on x axis. Draw the vertical line from coord w to cut the function; and make the tangent line to the function at the crossing point.

Gradient of function fe at tangent point T is the coefficient of the tangent line, call it value g.

The Maths

The more to the right side of the function, the higher the gradient is (positive).

The more to the left side of the function, the smaller the gradient is (negative).

Point T is on the right side of the function, thus, in order to reach to local minimum, w should be reduced.

The bigger g is, the faster w is reduced toward local minimum by this assignment:

⁠

Where r is called the learning rate, with w and g as described above. Value w later on in backpropagation will be a weight of a neuron.

GD Diagram

GD Diagram Description

The Maths

Gallery

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.