What is Gradient? Loss Function? Model Parameters? Gradient Descent?

Gradient Descent: Optimization algorithm; minimizes errors between predicted and actual results; updates parameters by moving against the gradient

Loss Function: aim is to minimize this function, closer to zero; measures how bad the prediction is in comparison to the actual true value; various methods are used; one is Mean Squared Error (MSE)

Gradient: Slope; direction of steepest increase of the loss; Thus, to minimize the loss, we move in the opposite direction of the gradient; gradient of loss is computed with respect to the parameters w and b.

Example:

Suppose we want to learn for y=wx + b that can fit the data. Data points (x,y) -> (1,3), (2,5);

1. Initial model parameters can be w=0, b=0; $latex \hat{y} = wx + b $ where, $latex b $ is the bias (intercept), $latex w $ is the weight (slope)

2. Loss function MSE: $latex \text{Loss} = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2 $

3. Gradient:

$latex \frac{\partial \text{Loss}}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} -2x_i(y_i – \hat{y}_i) $

$latex \frac{\partial \text{Loss}}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} -2(y_i – \hat{y}_i) $

4. Gradient Descent:

$latex w_{\text{new}} = w – \eta \cdot \frac{\partial \text{Loss}}{\partial w} $

$latex b_{\text{new}} = b – \eta \cdot \frac{\partial \text{Loss}}{\partial b} $

References:
https://uclaacm.github.io/gradient-descent-visualiser/

https://www.coursera.org/articles/what-is-gradient-descent

https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html