What is Transformer?

October 3, 2024October 4, 2024 by ComputingNotes

Large Language Models, Machine Learning

deep learning model
Type of neural network architecture https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/
Introduced in 2017
First described in 2017 paper called: “Attention is All You Need” by Ashish Vaswani et al. at Google Brain. https://arxiv.org/abs/1706.03762
Reference: https://www.ibm.com/topics/transformer-model
Can translate text and speech in near-real-time.

Used by:

OpentAI’s popular ChatGPT
BERT Model (Bidirectional Encoder Representations from Transformers)

Primary Innovations:

Positional Encoding
Self Attention: pay more attention to relevant information

Key Terms:

Tokens: Text converted into numerical representations; Each token is then contextualized within the scope of the context window other tokens https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)

Visual Explaination:

Attention in transformers, visually explained | Chapter 6, Deep Learning https://www.youtube.com/watch?v=eMlx5fFNoYc

Example:
https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/

References:

error: