Pre-Training. Post-Training. Language Modeling. P(the, mouse, ate, the, cheese) = 0.02 – syntactic knowledgeP(the, the, mouse, ate,cheese) = 0.0001 – semantic knowledgeP(…) Auto-Reggressive (AR) language model: Predict next word. Steps: she likely prefers: tokenize -> 1 -she, 2-likely, 3-prefers => pass to blackbox model => get probability distribution over next word prediction – sample & ...
Read More