Optimization

Rosenbrock Function $latex f(x,y) = (1-x)^2 + 100(y-x^2)^2$ https://docs.scipy.org/doc/scipy/tutorial/optimize.html

What is Gradient? Loss Function? Model Parameters? Gradient Descent?

Gradient Descent: Optimization algorithm; minimizes errors between predicted and actual results; updates parameters by moving against the gradient Loss Function: aim is to minimize this function, closer to zero; measures how bad the prediction is in comparison to the actual true value; various methods are used; one is Mean Squared Error (MSE) Gradient: Slope; direction ... Read More

Conda Environment

conda create -n env python=version conda activate env conda deactivate Useful tools: Starting jupyter notebook (env) file_path > jupyter notebook

Tensor basics

https://tensorly.org/stable/user_guide/tensor_basics.html Tensor – multi-dimensional array https://tensorly.org/stable/user_guide/tensor_decomposition.html

Building Large Language Models (LLMs)

Pre-Training. Post-Training. Language Modeling. P(the, mouse, ate, the, cheese) = 0.02 – syntactic knowledgeP(the, the, mouse, ate,cheese) = 0.0001 – semantic knowledgeP(…) Auto-Reggressive (AR) language model: Predict next word. Steps: she likely prefers: tokenize -> 1 -she, 2-likely, 3-prefers => pass to blackbox model => get probability distribution over next word prediction – sample & ... Read More
error: