Gradient Descent: Optimization algorithm; minimizes errors between predicted and actual results; updates parameters by moving against the gradient
Loss Function: aim is to minimize this function, closer to zero; measures how bad the prediction is in comparison to the actual true value; various methods are used; one is Mean Squared Error (MSE)
Gradient: Slope; direction of steepest increase of the loss; Thus, to minimize the loss, we move in the opposite direction of the gradient; gradient of loss is computed with respect to the parameters w and b.
Example:
Suppose we want to learn for y=wx + b that can fit the data. Data points (x,y) -> (1,3), (2,5);
1. Initial model parameters can be w=0, b=0; \(\hat{y} = wx + b \) where, \(b \) is the bias (intercept), \(w \) is the weight (slope)
2. Loss function MSE: \(\text{Loss} = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2 \)
3. Gradient:
\(\frac{\partial \text{Loss}}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} -2x_i(y_i – \hat{y}_i) \)
\(\frac{\partial \text{Loss}}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} -2(y_i – \hat{y}_i) \)
4. Gradient Descent:
\(w_{\text{new}} = w – \eta \cdot \frac{\partial \text{Loss}}{\partial w} \)
\(b_{\text{new}} = b – \eta \cdot \frac{\partial \text{Loss}}{\partial b} \)
References:
https://uclaacm.github.io/gradient-descent-visualiser/
https://www.coursera.org/articles/what-is-gradient-descent
https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html