What is Gradient? Loss Function? Model Parameters? Gradient Descent?

Gradient Descent: Optimization algorithm; minimizes errors between predicted and actual results; updates parameters by moving against the gradient

Loss Function: aim is to minimize this function, closer to zero; measures how bad the prediction is in comparison to the actual true value; various methods are used; one is Mean Squared Error (MSE)

Gradient: Slope; direction of steepest increase of the loss; Thus, to minimize the loss, we move in the opposite direction of the gradient; gradient of loss is computed with respect to the parameters w and b.

Example:

Suppose we want to learn for y=wx + b that can fit the data. Data points (x,y) -> (1,3), (2,5);

1. Initial model parameters can be w=0, b=0; \(\hat{y} = wx + b \) where, \(b \) is the bias (intercept), \(w \) is the weight (slope)

2. Loss function MSE: \(\text{Loss} = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2 \)

3. Gradient:

\(\frac{\partial \text{Loss}}{\partial w} = \frac{1}{n} \sum_{i=1}^{n} -2x_i(y_i – \hat{y}_i) \)

\(\frac{\partial \text{Loss}}{\partial b} = \frac{1}{n} \sum_{i=1}^{n} -2(y_i – \hat{y}_i) \)

4. Gradient Descent:

\(w_{\text{new}} = w – \eta \cdot \frac{\partial \text{Loss}}{\partial w} \)

\(b_{\text{new}} = b – \eta \cdot \frac{\partial \text{Loss}}{\partial b} \)

References:
https://uclaacm.github.io/gradient-descent-visualiser/

https://www.coursera.org/articles/what-is-gradient-descent

https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html

Leave a Reply

Your email address will not be published. Required fields are marked *

error: