Gradient Descent (GD)

2 min readJust now

AI generated image for gradient descent in 3D space

Gradient Descent (GD) is fundamental in training neural networks, enabling them to learn from data by updating the weights of the network to minimise the error between the predicted output and the actual target [RHW86]. GD is an optimisation algorithm that minimises the loss of a predictive model by iterating over a training dataset. Back-propagation, an automatic differentiation algorithm, calculates gradients for the weights in a neural network. Together, GD and back-propagation are used to train neural network models by continuously updating the model’s parameters to reduce errors.

Goodfellow et al. [GBC16] explain the learning process of neural networks as follows. Consider a simple feedforward neural network with an input layer, one hidden layer, and an output layer. Backpropagation computes the gradient of the loss function with respect to each weight. The process can be divided into two phases, the forward pass and the backward pass.

Mathematical calculation during forward pass

Mathematical calculation during backward pass

Weight and bias update post backward pass

GDandbackpropagation is a powerful and efficient method for training neural networks. By iteratively updating the network’s weights to minimise the loss function, it enables the network to learn complex patterns in data.

Happy coding ..!!

Gradient Descent (GD)

Written by Shivam Singh